Rajiv Shah

Data Scientist

Bio Talks Videos Publications Blog Experience Contact

Videos

Find current videos at TikTok, Instagram, or YouTube.
I have an Google Sheet that provides links and descriptions to many of my shorter videos.

Video Links

Deep Dive Videos

Evaluation for Generative AI - A simply explained starting point, Youtube (May 2025)

Using Reasoning LLMs (Claude with Python or Agno), Youtube (Apr 2025)

Get Started with Deepseek's GRPO using QWEN and Hugging Face, Youtube (Feb 2025)

Unit Testing for Natural Language (LLMs) + LMUnit model, Youtube (Feb 2025)

Training Kolmogorov-Arnold Networks (KAN) using Pytorch and Nixtla on M3/M4 Time Series Datasets, Youtube (Nov 2024)

Feature Selection Methods for Machine Learning, plus Feature Selection Curves, Youtube (Oct 2024)

Start using Llama 3.2 Vision Models with Hugging Face Transformers (on Snowflake), Youtube (Oct 2024)

Practical Lessons in Building Generative AI: RAG and Text to SQL, Youtube (Sep 2024)

Spark of AI: How Transfer Learning Unlocked AI's Potential, Youtube (Sep 2024)

Rules: A Simple & Effective Machine Learning Approach, Youtube (Aug 2024)

Intro to Generative AI and Trends (March 2024), Youtube (Jun 2024)

Model Interpretability and Explainability for Machine Learning Models, Youtube (Jun 2024)

Large Language Models (LLMs) Can Explain Their Predictions, Youtube (Jan 2024)

Evaluation for Large Language Models and Generative AI - A Deep Dive, Youtube (Nov 2023)

NanoGPT using Simpsons Data: Get Started with Large Language Models, Youtube (Sep 2023)

16 Challenges for LLMs - Paper Highlights, Youtube (Aug 2023)

Llama 2 Paper Explained, Youtube (July 2023)

GPT or BERT? Reviewing the tradeoffs of using Large Language Models versus smaller models, Youtube (Jun 2023)

Building Better Large Language Models - Key Concepts for Prompting and Fine Tuning, Youtube (Apr 2023)

Efficient Large Language Model training with LoRA and Hugging Face PEFT, Youtube (Mar 2023)

Text style transfer in a spreadsheet using Hugging Face Inference Endpoints, Youtube (Nov 2022)

SetFit: Few Shot Learning for Text Classification, Youtube (Oct 2022)

Prediction Intervals with Conformal Inference: An Intuitive Explanation, Youtube (Sep 2022)

LayoutLMv3 Training with CORD (receipts dataset), Youtube (Sep 2022)

Fine Tuning an Image Classifier on Indian Food Images, Youtube (Aug 2022)

Explanation Approaches for Transformers, Youtube (Aug 2022)

Short Form Videos

Netflix $1 million dollar prize, 1/9/22, TikTok YouTube

Netflix $1 million dollar prize #datascience #ai #netflix #PepsiApplePieChallenge

Optimal punt return, 1/10/22, TikTok YouTube

Optimal punt return #datascience #nfl

How computers think: Classification and Regression., 1/11/22, TikTok YouTube

How computers think: Classification and Regression. #datascience #machinelearning #fruitbowl

i love notebooks, 1/12/22, TikTok

i love notebooks #notebooks #programming #rstats

Accuracy is not your friend for most problems, 1/13/22, TikTok

Accuracy is not your friend for most problems #datascience #media #norythm

datasaurus, 1/14/22, TikTok YouTube

No big deal, use visualization #stats #datascience #datasaurus #datascience #analytics #anscombe #visualization

No big deal, use visualization #stats #datascience #datasaurus #datascience #analytics #anscombe #vi...

Intro for AI Literacy, 1/14/22, TikTok

Intro for AI Literacy #datascience #machinelearning #ai #programming #literacy #alleninstitute

Pie chart fails, 1/15/22, TikTok

Pie chart fails #stats #datascience #datavisualization #piechart #analytics #fails

datasaurus, 1/16/22, TikTok

Reply to @midnightlibrarian #datasaurus #stats #dinosaur #analytics explaining #anscombe

AI Literacy, Question 1, 1/16/22, TikTok

AI Literacy, Question 1, can AI think by itself? #ai #datascience #programming #counsciousness #alleninstitute #literacy #capcut

AI Literacy, Question 1, can AI think by itself? #ai #datascience #programming #counsciousness #alle...

datasaurus, 1/17/22, TikTok

Reply to @fondantlover datasaurus dozen howto #stats #anscombe #datascience #analytics

AI Literacy, Q2, Driverless cars, 1/18/22, TikTok

AI Literacy, Q2, Driverless cars #ai #datascience #driverless #cars #tesla #fsd #alleninstitute

2022 plans, 1/19/22, TikTok

2022 plans #2022 #datascience #ai #machinelearning #skillup start with AI Literacy @rajistics

Regression to the mean, 1/20/22, TikTok

Regression to the mean with the Madden Curse and Sports Illustrated Jinx #datascience #analytics #stats #maddencurse #sijinx #regression

Regression to the mean with the Madden Curse and Sports Illustrated Jinx #datascience #analytics #st...

Madden Curse, 1/21/22, TikTok

Logical song full explanation here: @rajistics #sijinx #maddencurse #stats #analytics #regression

Crowdsource labor, 1/22/22, TikTok

Crowdsource labor for #ai #machinelearning - longer video explaining this coming out later today.

Crowdsource labor, 1/23/22, TikTok

Reply to @garlic_gworl #fakeai #datascience #mechicalturk #aiethics #labor #ai

Comparing Logistic to GBM, 1/24/22, TikTok

Comparing algorithms spiral dataset, #datascience #machinelearning #algorithms #gbm #logisticregression

Comparing algorithms spiral dataset, #datascience #machinelearning #algorithms #gbm #logisticregress...

self driving cars, 1/25/22, TikTok

self driving cars and data quality - LOA - #datascience #machinelearning #selfdrivingcar #stanford #data #stats

self driving cars and data quality - LOA - #datascience #machinelearning #selfdrivingcar #stanford #...

Data science work is hard to schedule and plan, 1/26/22, TikTok

Data science work is hard to schedule and plan. It conflicts with agile methods. #datascience #machinelearning #dataanalytics #agile #scrummaster

Data science work is hard to schedule and plan. It conflicts with agile methods. #datascience #machi...

Tracking Covid, 1/27/22, TikTok

Tracking Covid - #datascience #analytics #wastewater #covid19 #data #monitoring

Joys of autocomplete, 1/28/22, TikTok

Joys of autocomplete, who is with me? #datascience #programming #vscode #jupyternotebook #coding #tabcompletion #python

Joys of autocomplete, who is with me? #datascience #programming #vscode #jupyternotebook #coding #ta...

Career Day, 1/29/22, TikTok

Machine learning engineer growing career #machinelearning #datascience #dataengineering #programming #ai #career #programmingbootcamp #statistics

Machine learning engineer growing career #machinelearning #datascience #dataengineering #programming...

So many hyperparameters, 1/30/22, TikTok

So many hyperparameters - this is from pytorch forecasting #datascience #machinelearning #hyperparameters #coding #algorithms #modeling

So many hyperparameters - this is from pytorch forecasting #datascience #machinelearning #hyperparam...

analyzing wastewater, 1/31/22, TikTok

Reply to @declinedher long history of analyzing wastewater for drug residue #dataanalysis #wastewater #drugs #monitoring @rajistics

Reply to @declinedher long history of analyzing wastewater for drug residue #dataanalysis #wastewate...

Earthquake visualization, 2/1/22, TikTok

Earthquake visualization from lazarusA #datascience #datavisualization #visualization #julia #python #earthquakes

Earthquake visualization from lazarusA #datascience #datavisualization #visualization #julia #python...

Scrum, 2/2/22, TikTok

Reply to @mrjohnlueders #scrum #datascience #agile #softwareengineering #analytics

analyzing wastewater, 2/3/22, TikTok

Reply to @grahamkechnie #wastewater #cornovirus #analysis #cryptic

Tensorflow playground, 2/4/22, TikTok

Tensorflow playground, link in comments, #tensorflow #deeplearning #datascience #analytics #neuralnetworks

Tensorflow playground, link in comments, #tensorflow #deeplearning #datascience #analytics #neuralne...

tensorflow playground, 2/5/22, TikTok

Reply to @bosstoastmaker tensorflow playground data > models #featureengineering #datascience #tensorflow #deeplearning #analytics #ai

Reply to @bosstoastmaker tensorflow playground data > models #featureengineering #datascience #tenso...

Two types of mistakes in classification, 2/6/22, TikTok

False positive and false negative #datascience #statistics #decionmaking #classificationalgorithm #algorithm

False positive and false negative #datascience #statistics #decionmaking #classificationalgorithm #a...

Regression to the mean, 2/8/22, TikTok

History of the term regression and regression to the mean #statistics #datascience #galton #regression #heriditary

History of the term regression and regression to the mean #statistics #datascience #galton #regressi...

Fungible Data (Not), 2/9/22, TikTok

Datasets have worldviews from Google PAIR, link in comments, #datascience #bias #machinelearning #ethics #pair-google #statistics

Datasets have worldviews from Google PAIR, link in comments, #datascience #bias #machinelearning #et...

Above Average, 2/10/22, TikTok

Reply to @declinedher being above average. I will add citation in the comments. #statistics #regressiontothemean #aboveaverage

Reply to @declinedher being above average. I will add citation in the comments. #statistics #regress...

Median versus Mean, 2/11/22, TikTok

Reply to @noleli median versus mean

Above Average, 2/11/22, TikTok

Reply to @notryantaylor here it is without music. this is for my 4 kids who all text me that they are above average.

Reply to @notryantaylor here it is without music. this is for my 4 kids who all text me that they ar...

NVIDA GPU, 2/12/22, TikTok

Dreams of a better GPU #gpu #nvidia #deeplearning #gaming #datascience

Above Average Part 2, 2/13/22, TikTok

Being above average part II. Cite in comments. @rajistics #statistics #regressiontothemean #aboveaverage

Being above average part II. Cite in comments. @rajistics #statistics #regressiontothemean #aboveave...

Tuskegee Airman, 2/14/22, TikTok

Tuskegee Airman by geo karamanis links to code in comments #TidyTuesday #rstats #datascience #datavisualization

Tuskegee Airman by geo karamanis links to code in comments #TidyTuesday #rstats #datascience #datavi...

Talk business not data science metrics, 2/16/22, TikTok

Talk business not data science metrics to have a business impact #datascience #machinelearning #statistics #analytics

Talk business not data science metrics to have a business impact #datascience #machinelearning #stat...

Tensorflow Playground Part 1, 2/17/22, TikTok

Shallow learning with tensorflow playground #datascience #tensorflow #python #machinelearning #deeplearning

Shallow learning with tensorflow playground #datascience #tensorflow #python #machinelearning #deepl...

Beer and diapers, 2/18/22, TikTok

Beer and diapers story of association of products. #datascience #recommendationsystems #marketing #analytics #correlation

Beer and diapers story of association of products. #datascience #recommendationsystems #marketing #a...

Starting Simple, 2/18/22, TikTok

Reply to @coronavirusvevo #xgboost #regression #statistics #datascience #algorithms

Tensorflow Playground Part 2, 2/20/22, TikTok

Reply to @bosstoastmaker shallow learning with tensorflow playground #datascience #tensorflow #python #machinelearning

Reply to @bosstoastmaker shallow learning with tensorflow playground #datascience #tensorflow #pytho...

Data Scientist title, 2/22/22, TikTok

The Data Scientist title is worth $$$ ‚Ç¨‚Ç¨‚Ç¨ ¬£¬£¬£ ¬•¬•¬•. #datascience #dataanalyst #analytics

Don‚Äôt do analysis for the sake of analysis, 2/22/22, TikTok

Don‚Äôt do analysis for the sake of analysis. Your analysis should be synced with a business objective. #datascience #analysis #dataanalyst

Don‚Äôt do analysis for the sake of analysis. Your analysis should be synced with a business objecti...

AI Literacy Survey, 2/23/22, TikTok

Q 4,5, and 7 from the Allen Institute survey #datascience #medialiteracy #ai @rajistics @rajistics @rajistics

Q 4,5, and 7 from the Allen Institute survey #datascience #medialiteracy #ai @rajistics @rajistics @...

Ukraine, 2/25/22, TikTok

Some things are bigger than data science, I have a personal connection here and have to express my support. #ukraine #priceoffreedom #datascience

Some things are bigger than data science, I have a personal connection here and have to express my s...

Regression to the Mean, 2/27/22, TikTok

Reply to @pal_protty negative reinforcement and #regressiontothemean . Link to the Nylon calculus #basketball article in comments. #datascience

Reply to @pal_protty negative reinforcement and #regressiontothemean . Link to the Nylon calculus #b...

Classification outcomes and probabilities, 3/2/22, TikTok

Classification outcomes and probabilities #datascience #machinelearning #algorithms

Tensorflow Playground Part 3, 3/4/22, TikTok

Earlier videos: @rajistics @rajistics #deeplearning #tensorflow #datascience #analytics

Reading Documentation, 3/5/22, TikTok

What tools are way too hard to use? #datascience #statistics #analytics

Understanding a confusion matrix, 3/7/22, TikTok

Understanding a confusion matrix, Part I video: @rajistics #datascience #statistics #machinelearning #confusionmatrix

Understanding a confusion matrix, Part I video: @rajistics #datascience #statistics #machinelearning...

Give it up to Data Engineers, 3/8/22, TikTok

Give it up to Data Engineers. #dataengineering #datascience #analytics

Data Engineering Intro, 3/9/22, TikTok

Reply to @nolankeller23

Survivorship bias, 3/10/22, TikTok

#datascience #analytics #statistics #wald #survivorshipbias

Excel, 3/11/22, TikTok

I know the pain. But there are ways to make it easy for people to use your code. #python #analysis #datascience

I know the pain. But there are ways to make it easy for people to use your code. #python #analysis #...

Languages, 3/12/22, TikTok

Reply to @bird_3288 #python #rstat #datascience #analytics #programming

Cryptic error messages, 3/12/22, TikTok

Cryptic error messages. Cmon. Give it up for actionable error messages that make coding a downhill sport.

Cryptic error messages. Cmon. Give it up for actionable error messages that make coding a downhill s...

Model Risk Management, 3/15/22, TikTok

Model Risk Management (MRM), important but can be frustrating. #datascience #regulatedindustries #explainability #statistics

Model Risk Management (MRM), important but can be frustrating. #datascience #regulatedindustries #ex...

Profit Curve, 3/15/22, TikTok

Profit Curve, See earlier parts on Classification Martrics here: @rajistics @rajistics #datascience #statistics #confusionmatrix

Profit Curve, See earlier parts on Classification Martrics here: @rajistics @rajistics #datascience ...

Fairness in models, 3/19/22, TikTok

Fairness in models #datascience #analytics #fairnessml #bias #algorithms

Time Series, 3/19/22, TikTok

It‚Äôs happened! Time series #datascience #timeseries #analytics #statistics

SPSS, 3/20/22, TikTok

I was rocking with SPSS back in 2009. I didn‚Äôt start using R until a few years later. We had to pay $$$ for a basic regression. #datascience #spss #statistics

I was rocking with SPSS back in 2009. I didn‚Äôt start using R until a few years later. We had to pa...

Data visualization tips, 3/21/22, TikTok

Data visualization tips #datascience #dataviz #analytics #datavisualization

Poorly prepped data, 3/22/22, TikTok

The pain. Data munging on poorly prepped data. #datascience #analytics #csv

Stackoverflow, 3/24/22, TikTok

We all play roulette with stackoverflow. #programming #datascience #python

Predicting NCAA basketball, 3/25/22, TikTok

Predicting NCAA basketball #marchmadness #datascience #sportsanalytics #illinois

Myth versus Reality, 3/25/22, TikTok

Myth versus Reality. #sql #datascience #analytics

Learn Regex, 3/26/22, TikTok

Learn Regex, it will pay off #regex #datascience #programming #analysis

Get a cloud server, 3/27/22, TikTok

Do it! Get a server in the cloud. Build your skills. #datascience #programming #analytics #digitalocean

Do it! Get a server in the cloud. Build your skills. #datascience #programming #analytics #digitaloc...

Acceptance Journey in Python Programming, 3/28/22, TikTok

It‚Äôs taken a while to accept this. #python #programming #datascience

GPU bills, 3/30/22, TikTok

It happens. Be careful. #aws #datascience #deeplearning #gpu

Unit Tests, 4/1/22, TikTok

I am awful about writing tests. This is why I don‚Äôt write production code. #datascience #cstok #programminghumor #codetok

I am awful about writing tests. This is why I don‚Äôt write production code. #datascience #cstok #pr...

Reinforcement learning with my Eat Melon! Demo, 4/5/22, TikTok YouTube

Reinforcement learning with my Eat Melon! Demo based on Karpathy #datascience #reinforcementlearning #techtok #machinelearning

Reinforcement learning with my Eat Melon! Demo based on Karpathy #datascience #reinforcementlearning...

Baseline Model, 4/6/22, TikTok

The Agony! #datascience #machinelearning #mltok #techtok #statistics

Powerpoint, 4/12/22, TikTok

I hope this pain isn‚Äôt shared widely #techtok #powerpoint #datascientist

SKlearn Playground, 4/12/22, TikTok

SKlearn Playground #datascience #machinelearning #statistics #techtok #sklearn

Rerunning your old code, 4/12/22, TikTok

Rerunning your old code #datascience #techtok #programming #analytics

Junior versus senior data scientist, 4/14/22, TikTok

Go explore if you are new #datascience #techtok #analytics

Open source can be a lot of work, 4/15/22, TikTok

Open source can be a lot of work #opensource #techtok #programming #python #github

Documented code, 4/15/22, TikTok

I need more time to code. #datascience #programming #techtok #python

AI is using input for reinforcement learning, 4/18/22, TikTok

Reply to @canutten1 Deep W with Atari Breakout #datascience #reinforcementlearning #techtok #machinelearning

Reply to @canutten1 Deep W with Atari Breakout #datascience #reinforcementlearning #techtok #machine...

Many ways to learn data science, 4/20/22, TikTok

Reply to @anthonycomputer Dive in and start! Lots of great stuff out there. #datascience #techtok #analytics

Reply to @anthonycomputer Dive in and start! Lots of great stuff out there. #datascience #techtok #a...

SAS users, 4/21/22, TikTok

Any old school SAS users out there? #datascience #statistics #sas

Executive updates, 4/22/22, TikTok

It‚Äôs exasperating. #techtok #datascience #programming

Learn about foundational models, 4/23/22, TikTok YouTube

Learn about foundational models, especially in #nlp #naturallanguageprocessing #datascience #deeplearning #analytics #techtok #openai

Learn about foundational models, especially in #nlp #naturallanguageprocessing #datascience #deeplea...

Resources for transformer models, 4/23/22, TikTok

Reply to @sqwadiladida resources for learning about transformer models in #naturallanguageprocessing #datascience #techtok #statistics #analytics

Reply to @sqwadiladida resources for learning about transformer models in #naturallanguageprocessing...

Kaggle, 4/24/22, TikTok

Reply to @ereb0s_rl #datascience #analytics #techtok #rstats #kaggle #fastair #machinelearning

Reinforcement Learning Class, 4/26/22, TikTok

Hugging Face #reinforcementlearning class #datascience #techtok #deeplearning #python

Imposter Syndrome, 4/28/22, TikTok

Facts #datascience #techtok #analytics #impostersyndrome

Daily activity plan, 4/28/22, TikTok

You can‚Äôt make this stuff up. Can I just say modeling? #datascience #analytics #statistics #scrum #techtok

You can‚Äôt make this stuff up. Can I just say modeling? #datascience #analytics #statistics #scrum ...

Red light cameras, 4/30/22, TikTok

Red light camera #chicago #datascience #redlightcamera #anomalydetection #statistics #techtok #analytics

Red light camera #chicago #datascience #redlightcamera #anomalydetection #statistics #techtok #analy...

Going with the flow, 5/1/22, TikTok

Facts. We need data. #datascience #statistics #analysis #techtok

Anomaly Detection, 5/1/22, TikTok

Reply to @misho9000 anomaly detection is hard #datascience #statistics #techtok #anomalydetection #machinelearning

Reply to @misho9000 anomaly detection is hard #datascience #statistics #techtok #anomalydetection #m...

Pyscript, 5/2/22, TikTok

This will be fun! #python #codetok #datascience #programming

Responding to @shaggy335 on Tech Topics, 5/3/22, TikTok

Reply to @shaggy335 #datascience #statistics #analytics #techtok #machinelearning

AWS Charge, 5/4/22, TikTok

Those GPUs. #datascience #codetok #python #analytics #aws

Software engineering writes ML models, 5/4/22, TikTok

This happens. #datascience #machinelearning #python #codetok #programming

Stackoverflow down, 5/5/22, TikTok

credit to Gavin from work - #codetok #stackoverflow #programming #python

You dot com Code view, 5/6/22, TikTok

Very excited and Richard isn‚Äôt paying me for this - #codetok #youdotcom #codingtiktok #python

How I use Github, 5/7/22, TikTok

Reply to @milekumulator how I use GitHub #datascience #github #codetok #python #sportsanalytics

Content with money, 5/10/22, TikTok

It‚Äôs tough to be content #codetok #techtok #datascience #programming

Software licensing, 5/10/22, TikTok

Software licensing #github #codetok #gpl #programming #python #creativecommons #copyright

AI Predicting social outcomes, 5/11/22, TikTok

Predict social outcomes is not doable by #ai #ethics #bias #datascience #statistics #snakeoil

Week 1 Reinforcement Learning Class, 5/12/22, TikTok

Week 1. #reinforcementlearning #huggingface #datascience #python #codetok #programming

Chronicles of OPT Training, 5/12/22, TikTok

Chronicles of OPT Training #meta #nlp #datascience #machinelearning #deeplearning #codetok #python

Data analyst versus data scientist, 5/13/22, TikTok

It works. #datascience #analytics #codetok #statistics #dataanalyst

History of XGBoost, 5/14/22, TikTok

#xgboost short history. #datascience #statistics #machinelearning #codetok

Open Source versus Explainability, 5/15/22, TikTok

#opensource #explainability #datascience #statistics #codetok #programming my intro video: @rajistics

#opensource #explainability #datascience #statistics #codetok #programming my intro video: @rajistic...

Python dependency Hell, 5/16/22, TikTok

Use it! #python #conda #codetok #datascience

Conways law, 5/17/22, TikTok

I have lived this. #conwayslaw #softwaredevelopment #codetok #programming

Type I or Type II erro, 5/19/22, TikTok

Favorite tweet today. #statistics #datascience #codetok #machinelearning

Regulatinjg AI: Insurance, 5/21/22, TikTok

#insurance #regulation #datascience #statistics #interpretablemodels #codetok

Bias in medical imaging, 5/22/22, TikTok

Bias in Medical Imaging #datascience #codetok #algorithmicbias #imaging #machinelearning #bias motivated by the comments from @rajistics

Bias in Medical Imaging #datascience #codetok #algorithmicbias #imaging #machinelearning #bias motiv...

Let the data speak, 5/23/22, TikTok

From Spiegelhalter interview on Artists of Data Science podcast #datascience #statistics #codetok #dataanalysis

From Spiegelhalter interview on Artists of Data Science podcast #datascience #statistics #codetok #d...

Security Training, 5/24/22, TikTok

My favorite was a training on how to use zoom #securitytraining #codetok

Topic Classification Bertopic, 5/25/22, TikTok

Highlighting BerTopic #datascience #statistics #nlp #huggingface #codetok

Changing your Model, 5/25/22, TikTok

It has happened. #datascience #codetok #machinelearning #analytics

Reinforcement Learning Class 1, 5/26/22, TikTok

#reinforcementlearning #huggingface #datascience #deeplearning #codetok #deepqlearning - Week 1: @rajistics

#reinforcementlearning #huggingface #datascience #deeplearning #codetok #deepqlearning - Week 1: @ra...

Zero Shot Learning, 5/27/22, TikTok

Zero shot learning #datascience #machinelearning #huggingface #nlp #naturallanguageprocessing #statistics background: @rajistics #codetok

Zero shot learning #datascience #machinelearning #huggingface #nlp #naturallanguageprocessing #stati...

Deep Learning, 5/29/22, TikTok

Reply to @zythesciguy Reply to @zythesciguy #datascience #statistics #codetok

Parquet and Arrow file formats, 5/31/22, TikTok

Parquet and Arrow file formats #datascience #analytics #bigdata #codetok #dataengineer

Why you should use group partitioning, 6/2/22, TikTok

Why you should use group partitioning #datascience #machinelearning #statistics #codetok #deeplearning #andrewng

Why you should use group partitioning #datascience #machinelearning #statistics #codetok #deeplearni...

Data Science Projects Page, 6/3/22, TikTok

Keep your data science projects page loaded by making open source versions of your work. #datascience #codetok #programming #python

Keep your data science projects page loaded by making open source versions of your work. #datascienc...

Business impact, 6/4/22, TikTok

Your regular reminder that you should translate the impact of your model into something your stakeholders care about. #datascience #statistics #analytics #codetok

Your regular reminder that you should translate the impact of your model into something your stakeho...

Document AI with LayoutLM, 6/5/22, TikTok

Document AI with LayoutLM #datascience #codetok #naturallanguageprocessing #layoutml #huggingface #ü§ó #ocr #deeplearning #multimodal

Document AI with LayoutLM #datascience #codetok #naturallanguageprocessing #layoutml #huggingface #...

ML Lifecycle, 6/7/22, TikTok

I was so focused! Data is hard. #datascience #dataanalysis #statistics #codetok #mltok #machinelearning

I was so focused! Data is hard. #datascience #dataanalysis #statistics #codetok #mltok #machinelearn...

Level set expectations early!, 6/7/22, TikTok

Level set expectations early! People have unrealistic views. #datascience #dataanalytics #statistics #codetok

Level set expectations early! People have unrealistic views. #datascience #dataanalytics #statistics...

Working at Hugging Face, 6/8/22, TikTok

#duet with @hugging_face looks like we are on Tik Tok. Go try mini Dalle, go to hF.co

Transformers, 6/10/22, TikTok

Transformers aren‚Äôt new anymore #datascience #codetok #deeplearning #machinelearning #statistics

GDG DevFest Ukraine,, 6/11/22, TikTok

GDG DevFest Ukraine, sign up! #datascience #codetok #huggingface #dallemini #bigscience #devfestforukraine #standwithukraine

GDG DevFest Ukraine, sign up! #datascience #codetok #huggingface #dallemini #bigscience #devfestforu...

Time Series Decomposition, 6/12/22, TikTok

Other tips I should share? #datascience #timeseries #statistics #dataanalysis #python #codetok #mltok

Other tips I should share? #datascience #timeseries #statistics #dataanalysis #python #codetok #mlto...

Titanic, 6/15/22, TikTok

Good times, what was your first ML model? #titanic #datascience #statistics #codetok #machinelearning #rstats #python

Good times, what was your first ML model? #titanic #datascience #statistics #codetok #machinelearnin...

Error Analysis 3 Tips, 6/17/22, TikTok

What are you favorite tips for error analysis? #datascience #statistics #analytics #machinelearning #codetok #mltok

What are you favorite tips for error analysis? #datascience #statistics #analytics #machinelearning ...

Error Analysis, 6/19/22, TikTok

Reply to @mat.cov05 annotator agreement puts a ceiling on your model performance #datascience #statistics #codetok

Reply to @mat.cov05 annotator agreement puts a ceiling on your model performance #datascience #stati...

Reinforcement Learning Class Week 2, 6/19/22, TikTok

Try it out, link in comments. #huggingface #datascience #reinforcementlearning #deeplearning #codetok #mltok Earlier weeks: @rajistics @rajistics

Try it out, link in comments. #huggingface #datascience #reinforcementlearning #deeplearning #codeto...

Data Science and ML news, 6/20/22, TikTok

#stitch with @debtcollective Marketing and PR. This is a big topic and a lot of nuance isn‚Äôt in this video. Also relationships with academia. in #datascience #machinelearning #codetok #mltok

#stitch with @debtcollective Marketing and PR. This is a big topic and a lot of nuance isn‚Äôt in th...

Forecasting Model, 6/21/22, TikTok

Never underestimate the power of the status quo #datascience #forecasting #statistics #SAS #python #codetok

Never underestimate the power of the status quo #datascience #forecasting #statistics #SAS #python #...

DataRobot, 6/22/22, TikTok

I have a lot more tea #datarobot #corporategreed #datascience #codetok #techtok

Practical Data Science, 6/24/22, TikTok

What kind are you? #datascience #statistics #python #codetok #mltok #practicaldatascience

Spotify Annoy for Similarity Searching, 6/26/22, TikTok

How are you using similarity search? #nearestneighbor #annoy #spotify #datascience #statistics #codetok #python #similaritysearch

How are you using similarity search? #nearestneighbor #annoy #spotify #datascience #statistics #code...

Ode to Shap, 6/28/22, TikTok

One of my favorites for #explainability #datascience #statistics #interpretability #codetok #python #machinelearning

One of my favorites for #explainability #datascience #statistics #interpretability #codetok #python ...

Untitled Notebooks, 6/30/22, TikTok

I have done a lot of good work in untitled python notebooks. #datascience #machinelearning #python #codetok #thosethatgetitgetit

I have done a lot of good work in untitled python notebooks. #datascience #machinelearning #python #...

Tensorboard Embedding Projector, 7/1/22, TikTok

Working with embeddings today. #datascience #word2vec #embeddings #tensorflow #codetok #tensorboard

Predicting Crime, 7/2/22, TikTok

Crime seems easy to predict, but is super messy. #datascience #crimetok #chicago #statistics #crimonology #machinelearning #codetok #aisnakeoil

Crime seems easy to predict, but is super messy. #datascience #crimetok #chicago #statistics #crimon...

Model Explainability, 7/4/22, TikTok

Who enjoys explaining how ML models work? #machinelearning #datascience #statistics #codetok

Semantic search, 7/5/22, TikTok

Replying to @darianv19 semantic search versus lexicon search. Emeddings help power semantic search. #datascience #embeddings #python

Replying to @darianv19 semantic search versus lexicon search. Emeddings help power semantic search. ...

Data Scientist in two weeks, 7/6/22, TikTok

No click bait on this account. Feeling sick today (and upset between Roe and Highland). Mailing it in today. #datascience #statistics #analytics #codingbootcamp

No click bait on this account. Feeling sick today (and upset between Roe and Highland). Mailing it i...

Poisson Distribution, 7/8/22, TikTok

Lots of real world problems, it pays to know distributions like tweedie. Still sick, so you get old tik tok from my drafts. #datascience #statistics #acturialscience #codetok

Lots of real world problems, it pays to know distributions like tweedie. Still sick, so you get old ...

Leakage, 7/10/22, TikTok

Watch out for leakage, it happens even to the best. #datascience #statistics #dataleakage #targetleakage #machinelearning

Watch out for leakage, it happens even to the best. #datascience #statistics #dataleakage #targetlea...

Object Detection, 7/12/22, TikTok

Back! Time for AI on images. #datascience #computervision #objectdetection #yolo #machinelearning #codetok

Back! Time for AI on images. #datascience #computervision #objectdetection #yolo #machinelearning #c...

When IT comes Knocking, 7/13/22, TikTok

What did i do this time? I hope your IT experienced go much better.

Reinforcement Learning, 7/15/22, TikTok

Long video in comments, #huggingface #datascience #reinforcementlearning #deeplearning #codetok #mltok Earlier weeks: @Rajiv Shah @Rajiv Shah

Long video in comments, #huggingface #datascience #reinforcementlearning #deeplearning #codetok #mlt...

Decision Trees, 7/15/22, TikTok

Trees are so nice to work, but dont forget these steps for other algorithms. #datascience #xgboost #randomforest #statistics #machinelearning #codetok

Trees are so nice to work, but dont forget these steps for other algorithms. #datascience #xgboost #...

reinforcement learning video from Week 4, 7/15/22, TikTok

Replying to @Rajiv Shah long version of deep reinforcement learning video from Week 4

Data Science Tips, 7/16/22, TikTok

Replying to @minisdlatvia my big tip for learning data science #datascience #machinelearning #analytics #codetok #webapps #gradio #streamlit #python #rstats

Replying to @minisdlatvia my big tip for learning data science #datascience #machinelearning #analyt...

Universal Translator, 7/19/22, TikTok YouTube

Quick intro, let me know if a deeper dive is useful. #translation #meta #datascience #machinelearning #huggingface

Quick intro, let me know if a deeper dive is useful. #translation #meta #datascience #machinelearnin...

Going from R to Python, 7/21/22, TikTok

I love #rstats, but spend most of my time now in #python #datascience #codetok #machinelearning

Data / Target Leakage, 7/22/22, TikTok YouTube

Replying to @Data Storyteller Here are two examples of data or target leakage. I bet people have other fun examples. #datascience #targetleakage #dataleakage #machinelearning

Replying to @Data Storyteller Here are two examples of data or target leakage. I bet people have oth...

Cruising Along Coding, 7/23/22, TikTok

Dreaded git push error. Had a little help tonight. #git #datascience #python

Dynamic Adversarial Data Collection, 7/23/22, TikTok YouTube

Offering ways to improve your machine learning models #huggingface #datascience #codetok #datacentricai #adversarial

Offering ways to improve your machine learning models #huggingface #datascience #codetok #datacentri...

Data Prep will be Quick, 7/24/22, TikTok

News flash: Data scientists spend lots of time on data prep/exploration #datascience #dataengineering #analytics #codetok

News flash: Data scientists spend lots of time on data prep/exploration #datascience #dataengineerin...

Notebooks versus Script, 7/26/22, TikTok

I like notebooks for data science, but others differ. #datascience #jupyternotebook #codetok #python

Github Copilot, 7/27/22, TikTok YouTube

I still havent tried copilot. Have you? #datascience #codetok #codex #copilot #python

Stackoverflow, 7/29/22, TikTok

How else can you work? #datascience #stackoverflow #codetok

Need to Study Your Data, 8/1/22, TikTok YouTube

Probe the data #dataanalysis #datascience #statistics #bias

Have projects, 8/3/22, TikTok

Have some projects in your github #datascience #github #codetok

ICML Satire, 8/3/22, TikTok

Funny stuff, not created by me - #datascience #codetok #deeplearning

Training an image classifier using ü§ó transformers, 8/4/22, TikTok YouTube

Training an image classifier using ü§ó transformers #datascience #analytics #codetok #deeplearning #huggingface Longer video at other site using the same -rajistics

Training an image classifier using ü§ó transformers #datascience #analytics #codetok #deeplearning ...

Explaining ML Models, 8/6/22, TikTok YouTube

Is explainability important for you? #datascience #explainability #interpretability #statistics #codetalk #machinelearning

Is explainability important for you? #datascience #explainability #interpretability #statistics #cod...

Data Prep will be Quick, 8/7/22, TikTok

Its always longer than you want to get your data prepped. #datascience #dataengineering #analytics #codetok

Its always longer than you want to get your data prepped. #datascience #dataengineering #analytics #...

Zero-shot object detection, 8/9/22, TikTok YouTube

Zero-shot object detection. #datascience #codetok #huggingface #objectdetection #deeplearning #zeroshotclassification

Zero-shot object detection. #datascience #codetok #huggingface #objectdetection #deeplearning #zeros...

When people ask about AGI, 8/10/22, TikTok

I like to stay practical and plenty to get excited about and get worries about without AGI. AGI is artifical general intelligence and the idea computers will be sentient and think like people. #agi #datascience #artificialintelligence #codetok

I like to stay practical and plenty to get excited about and get worries about without AGI. AGI is a...

Machine learning into production, 8/11/22, TikTok

Don‚Äôt feel bad if you havent put a machine learning model into production. Lots of valuable data scientist haven‚Äôt done fhat.

Don‚Äôt feel bad if you havent put a machine learning model into production. Lots of valuable data s...

Predicting Passes, 8/12/22, TikTok YouTube

Overdue for sports analytics #datascience #analytics #codetok #sportsanalytics #machinelearning

AI for College Math, 8/14/22, TikTok

Amazing how this stuff keeps getting better #datascience #codetok #machinelearning #codex

NFL Analytics, 8/16/22, TikTok YouTube

Sports! #datascience #analytics #codetok #machinelearning #rstats #footballanalytics #statistics

Old Github Projects, 8/17/22, TikTok

And even better when they submit an issue #datascience #codetok #opensource

Target leakage, 8/18/22, TikTok

Leakage is omnipresent #datascience #analytics #codetok #targetleakage

Explanations for transformers, 8/18/22, TikTok

Explanations for transformers gently #datascience #codetok #deeplearning

Red Flags in Data Science, 8/19/22, TikTok

We all want to get paid. But just know you will end up miserable. #datascience #codetok #analytics

My data science setup, 8/20/22, TikTok YouTube

My data science setup for now #datascience #codetok #python #rstats #posit #vscode #googlecolab #digitalocean #conda

My data science setup for now #datascience #codetok #python #rstats #posit #vscode #googlecolab #dig...

K-means algorithm, 8/22/22, TikTok YouTube

Fun way to talk about K-means algorithm #datascience #codetok #analytics #machinelearning

Updating your Linkedin Profile, 8/23/22, TikTok

Filling in those job duties üö©üö© #datascience #codetok

Best use of Machine LEarning, 8/24/22, TikTok

Peak ML #datascience #codetok #huggingface #gradio #huggable #imageclassification

Stable Diffusion, 8/25/22, TikTok

Stable diffusion, go run it yourself! It‚Äôs so awesome. #datascience #codetok #aipub #huggingface #machinelearning

Stable diffusion, go run it yourself! It‚Äôs so awesome. #datascience #codetok #aipub #huggingface #...

Duckdb and Spark, 8/26/22, TikTok

Seen this being hashed out on Twitter and had to join #dataengineering #codetok #duckdb #spark #datascience

Seen this being hashed out on Twitter and had to join #dataengineering #codetok #duckdb #spark #data...

Open Source and Stable Diffusion, 8/27/22, TikTok YouTube

Open Source with Stable Diffusion - #datascience #codetok #machinelearning #stablediffusion #opensourcesoftware

Open Source with Stable Diffusion - #datascience #codetok #machinelearning #stablediffusion #opensou...

Kaggle, 8/28/22, TikTok

What‚Äôs the deal with those competition rules #datascience #codetok #analytics #kaggle

Me Starting in Data Science, 8/30/22, TikTok

#duet with @Sylar2.5 #parodysong #datascience #codetok

Loss Functions, 8/30/22, TikTok Instagram YouTube

Loss Functions - simple example of MAE versus RSME #datascience #statistics #analytics #codetok #regression

Loss Functions - simple example of MAE versus RSME #datascience #statistics #analytics #codetok #reg...

Start with Examples, 9/3/22, TikTok

Code not working? start with the documented examples #datascience#rstats #machinelearning #codetok #python

Code not working? start with the documented examples #datascience#rstats #machinelearning #codetok #...

Maths, 9/4/22, TikTok YouTube

What a data scientist does #datascience #analytics #codetok #python

AI for Pose Detection,, 9/5/22, TikTok YouTube

Using AI for Pose Detection, this is such a cool application. #datascience #deeplearning #codetok #posedetection #sportsanalytics

Using AI for Pose Detection, this is such a cool application. #datascience #deeplearning #codetok #p...

Videos with stable diffusion, 9/7/22, TikTok

Videos with stable diffusion #datascience #machinelearning #stablediffusion #codetok

Using SMOTE for Imbalanced Data, 9/8/22, TikTok

Root Access, 9/9/22, TikTok

The damage I have done with root access. What have you done? #codetok #python

Embeddings / Latent Space, 9/10/22, TikTok

Replying to @petererickson.art This was tough, a lot of ground to cover. Let me know what I messed up on. I also have related videos on embeddings @rajistics #datascience #stablediffusion #dalle #deeplearning #codetok #machinelearning #latentspace

Replying to @petererickson.art This was tough, a lot of ground to cover. Let me know what I messed u...

Interpolation or Extrapolation, 9/13/22, TikTok

Critical question when framing out analytic questions, since extrapolation has got me into trouble before. #datascience #analytics #codetok

Critical question when framing out analytic questions, since extrapolation has got me into trouble b...

LAION Latent Space for Stable Diffusion, 9/14/22, TikTok LinkedIn

Showing the latent space for stable diffusion. #stablediffusion #datascience #machinelearning #codetok #umap

Showing the latent space for stable diffusion. #stablediffusion #datascience #machinelearning #codet...

Hard to be a Statistician, 9/15/22, TikTok

It‚Äôs rough for statisticians, machine learning is so popular #datascience #analytics #statistics #machinelearning

It‚Äôs rough for statisticians, machine learning is so popular #datascience #analytics #statistics #...

Interpreting stable diffusion, 9/16/22, TikTok

Interpreting stable diffusion #stabilitydiffusion #datascience #codetok #machinelearning #texttoimage

Interpreting stable diffusion #stabilitydiffusion #datascience #codetok #machinelearning #texttoimag...

Working with Categorical data, 9/17/22, TikTok

Working with Categorical data using ordinal, one hot (dummy), and target encoding #datascience #statistics #analytics #featureengineering

Working with Categorical data using ordinal, one hot (dummy), and target encoding #datascience #stat...

Data Engineering, 9/17/22, TikTok

Then my team builds data pipelines for the next eight months #datascience #dataengineering #analytics

Then my team builds data pipelines for the next eight months #datascience #dataengineering #analytic...

Code examples, 9/20/22, TikTok

I much prefer working through code examples than decoding equations. I can‚Äôt be the only one. #datascience #statistics

I much prefer working through code examples than decoding equations. I can‚Äôt be the only one. #dat...

Prediction Intervals, 9/20/22, TikTok

Why you want prediction intervals instead of point predictions #datascience #machinelearning #statistics #predictioninterval

Why you want prediction intervals instead of point predictions #datascience #machinelearning #statis...

Prediction Intervals with conformal prediction, 9/21/22, TikTok Instagram

Getting prediction intervals with conformal prediction. This is a very simple intro, it can do much more. #datascience #statistics #predictioninterval #conformalprediction

Getting prediction intervals with conformal prediction. This is a very simple intro, it can do much ...

Visual Question/Answering with Document AI, 9/22/22, TikTok

Visual Question/Answering with Document AI #datascience #analytics #codetok #huggingface #documentai

Rust for machine learning., 9/25/22, TikTok Instagram

Rust for machine learning. It‚Äôs useful in some cases for ML, but learn python first. #datascience #codetok #python #machinelearning #rust

Rust for machine learning. It‚Äôs useful in some cases for ML, but learn python first. #datascience ...

No Attention Span, 9/26/22, TikTok

I have no attention span. How will I learn from these videos? #datascience #codetok #python

Anomaly detection benchmark, 9/26/22, TikTok

Anomaly detection is hard. This is an introduction to anomaly detection algorithms. The video focuses on the results for ADBench and what data scientists should now do. #datascience #analytics #codetok #anomalydetection @rajistics

Anomaly detection is hard. This is an introduction to anomaly detection algorithms. The video focuse...

Music Videos with AI, 9/28/22, TikTok

Creating music videos with stable diffusion and whisper. This colab notebook uses a dream studio backend for the images. Another great step in generating AI content. #datascience #analytics #stablediffusion

Creating music videos with stable diffusion and whisper. This colab notebook uses a dream studio bac...

Do Over, 9/29/22, TikTok

At least it will be faster to build the second time. Ugh. How often have you had to recode something?

At least it will be faster to build the second time. Ugh. How often have you had to recode something...

Cleaning data, 9/30/22, TikTok

Cleaning data is such a pain. I remember having over 130+ unique combinations for US States in one project.

Cleaning data is such a pain. I remember having over 130+ unique combinations for US States in one p...

Distance metrics, 10/2/22, TikTok

Replying to @chairstaple so many good distance metrics - what‚Äôs yours? This video covers Hamming, Levenshtein, Euclidean, Manhattan, and Mahalanobis distance. These come in handy when cleaning data or multivariate analysis, such as anomaly detection.

Replying to @chairstaple so many good distance metrics - what‚Äôs yours? This video covers Hamming, ...

Visualization tools, 10/3/22, TikTok

ggplot, matplotlib, plotly, and seaborn are what data scientists use to make a plot or graph. #datascience #visualization #plots #analytics

ggplot, matplotlib, plotly, and seaborn are what data scientists use to make a plot or graph. #datas...

Plotly, 10/5/22, TikTok

Replying to @jbfjhcfv plotly is a great package for folks using R or Python. It‚Äôs open source, so anyone can use it. #datascience #visualization #analytics #plotly #python #rstats

Replying to @jbfjhcfv plotly is a great package for folks using R or Python. It‚Äôs open source, so ...

Data Engineer, 10/6/22, TikTok

It pays to be organized. Find a friendly data engineer if you need to. #datascience #analytics

About Me, 10/7/22, TikTok

Data science is a pretty awesome job. Much better than my past jobs of working thr IT helpdesk or painting rocks. #datascience #analytics #statistics

Data science is a pretty awesome job. Much better than my past jobs of working thr IT helpdesk or pa...

NLP with Spacy, 10/8/22, TikTok

Quick intro to spacy, which is a standard tool for people doing natural language processing #nlp or text analytics. Not my best video, buts it‚Äôs Friday and it‚Äôs late. #datascience #analytics #codetok #python

Quick intro to spacy, which is a standard tool for people doing natural language processing #nlp or ...

When they complain about a models performance, 10/8/22, TikTok

Great way to get under the skin of your data scientist. #datascience #analytics #codetok

Prediction Intervals: Why they are Useful, 10/9/22, TikTok

Why you want prediction intervals instead of point predictions. This is a repost because the first one was taken down. #datascience #codetok #machinelearning #statistics #predictioninterval

Why you want prediction intervals instead of point predictions. This is a repost because the first o...

Baseline Model for Time Series, 10/9/22, TikTok Instagram YouTube

Always have a baseline model. For time series, you can often compare to what happened in a previous time step, like last week. There are error metrics like MASE built on this idea. #datascience #codetok #statistics #timeseriesforcasting #timeseries I can do more of these baselines if you all find this useful.

Always have a baseline model. For time series, you can often compare to what happened in a previous ...

Recall Precision, 10/10/22, TikTok

Your data science 101 reminder when working with classification models. #datascience #statistics #codetok

Your data science 101 reminder when working with classification models. #datascience #statistics #co...

Diffusion models for markup, 10/13/22, TikTok Instagram

Diffusion models for markup. #datascience #machinelearning #stablediffusion

Cosine Similarity, 10/14/22, TikTok

Cosine similarity is a must know when working with vectors. It‚Äôs very useful and widely used in #machinelearning #datascience #statistics Should we go deeper into working with vectors and matrices? #python

Cosine similarity is a must know when working with vectors. It‚Äôs very useful and widely used in #m...

Inducing Positive Perspectives with Text Reframing, 10/14/22, TikTok Instagram YouTube

AI that makes you feel better. The paper is Inducing Positive Perspectives with Text Reframing. You can find a demo over at ü§ó hugging face spaces by Ella2323 called Positive Reframing. #machinelearning #datascience #codetok

AI that makes you feel better. The paper is Inducing Positive Perspectives with Text Reframing. You ...

Pronouce Latex, 10/15/22, TikTok

Who would have known that my pronunciation of LaTeX would be such a big deal and so divisive. It‚Äôs all good. Go listen for yourself. @rajistics

Who would have known that my pronunciation of LaTeX would be such a big deal and so divisive. It‚Äôs...

Pandas in Sklearn pipelines, 10/18/22, TikTok YouTube

It‚Äôs almost here. Full support for pandas in sklearn pipelines. #machinelearning #datascience #codetok #python #sklearn #sci-kit

It‚Äôs almost here. Full support for pandas in sklearn pipelines. #machinelearning #datascience #cod...

SQL, 10/18/22, TikTok

Sql just doesn‚Äôt go away and is hipper than ever. #datascience #dataengineering

Explainability with Transformer Interpret Vision Models, 10/19/22, TikTok Instagram YouTube

Getting explainability when working with transformer based image or vision models. Uses Captum on the backend, but makes it easy to get image attributions. #datascience #machinelearning #computervision #captum #huggingface #explainability #codetok

Getting explainability when working with transformer based image or vision models. Uses Captum on th...

Outliers, 10/20/22, TikTok

Those pesky outliers.

XGboost versus Neural Network, 10/21/22, TikTok

Our fellow algorithms calling mom featuring our linear model, XGBoost, and Neural Networks. I had fun making them.

Our fellow algorithms calling mom featuring our linear model, XGBoost, and Neural Networks. I had fu...

TabPFN Rant, 10/22/22, TikTok Instagram

TabPFN revolution in data science. Please don‚Äôt your time on all this hype. Every week there is a revolution announced on Twitter. Ignore it. True greatness takes time. #datascience #machinelearning #statistics #tabpfn

TabPFN revolution in data science. Please don‚Äôt your time on all this hype. Every week there is a ...

Let me run one more test, 10/23/22, TikTok

Analysis never ends.

CLIP Interrogator, 10/25/22, TikTok Instagram YouTube

CLIP Interrogator is available over at the hugging face spaces. Have fun! #datascience #machinelearning #stablediffusion #huggingface

CLIP Interrogator is available over at the hugging face spaces. Have fun! #datascience #machinelearn...

Dalle Mini Trademark, 10/28/22, TikTok Instagram

Mixing in a bit of law with the usual data science. Let me know if this is interesting or you waiting for the deep dive on dbscan clustering. #craiyon #dallemini #stablediffusion #texttoimage #machinelearning #datascience

Mixing in a bit of law with the usual data science. Let me know if this is interesting or you waitin...

Anscombes quartet, 10/29/22, TikTok Instagram YouTube

Reminder to visualize your data with one of my favorites #anscombesquartet #datavisualization #datascience #statistics

Reminder to visualize your data with one of my favorites #anscombesquartet #datavisualization #datas...

Explaining patents, trademarks, copyright, and licenses., 10/30/22, TikTok Instagram

Software exec at the end is the best. Your quick intro to patents, trademarks, copyright, and licenses. I see too many comments where people get these confused.

Software exec at the end is the best. Your quick intro to patents, trademarks, copyright, and licens...

Contrastive learning, 11/3/22, TikTok Instagram

Contrastive learning is common for folks working in NLP and images. This was new to me, so wanted to share the intuition a bit more widely. This is an introduction, there are so many contrastive loss functions for different scenarios.

Contrastive learning is common for folks working in NLP and images. This was new to me, so wanted to...

Positivity Spreadsheet, 11/4/22, TikTok Instagram

Having some fun connecting a spreadsheet to a ML model. It wasn‚Äôt too hard and it‚Äôs pretty cool to have it working this way. #datascience #machinelearning #huggingface

Having some fun connecting a spreadsheet to a ML model. It wasn‚Äôt too hard and it‚Äôs pretty cool ...

Causation, 11/5/22, TikTok

Simple tip, never claim causation. Unless you have an experimental design, it‚Äôs hard to prove. #datascience #machinelearning #statistics

Simple tip, never claim causation. Unless you have an experimental design, it‚Äôs hard to prove. #da...

Its not in the book, 11/5/22, TikTok

Great data scientists figure out the best questions come from talking to people. #datascience book is practical python and opencv by rosebruck, great resource.

Great data scientists figure out the best questions come from talking to people. #datascience book i...

FIGS Interpretable Model, 11/5/22, TikTok Instagram YouTube

Interpretable models are often overlooked, but a great addition to your data science toolkit. Imodels is a great python package for getting started. #datascience #machinelearning #interpretablemodels #imodels .#statistics

Interpretable models are often overlooked, but a great addition to your data science toolkit. Imodel...

Editing Facts in GPT, 11/6/22, TikTok Instagram YouTube

Your weekly dose of LLM news. I liked this because it had interesting results with a smart approach. #datascience #machinelearning #largelanguagemodels

Your weekly dose of LLM news. I liked this because it had interesting results with a smart approach....

Flan T5 Fails, 11/9/22, TikTok Instagram

Checking out Flan T5 large language models. Let me know what wisdom you can find in this model. #machinelearning #datascience #largelanguagemodels

Checking out Flan T5 large language models. Let me know what wisdom you can find in this model. #mac...

Calibrating your model, 11/11/22, TikTok Instagram YouTube

It‚Äôs important to make sure your model is well calibrated. This becomes especially important with imbalanced data. #machinelearning #datascience #statistics

It‚Äôs important to make sure your model is well calibrated. This becomes especially important with ...

Like p values, 11/12/22, TikTok

Data scientists will typically use regularization, which means no p values. #machinelearning #datascience #statistics #pvalues

Data scientists will typically use regularization, which means no p values. #machinelearning #datasc...

Ablation with Stable Diffusion, 11/12/22, TikTok Instagram YouTube

Applying a classic methodology of ablation when working with stable diffusion prompts. Ablation is very common in many techniques to understand how models are working. #machinelearning #datascience #statistics #stablediffusion #ablationsurgery

Applying a classic methodology of ablation when working with stable diffusion prompts. Ablation is v...

Predict Sentiment in 3 lines, 11/13/22, TikTok Instagram

New style of content, let me know if you want more like this. Predict sentiment #machinelearning #datascience #transformers #huggingface

New style of content, let me know if you want more like this. Predict sentiment #machinelearning #da...

My advice on Large Language Models and Stable Diffusion, 11/14/22, TikTok

Replying to @joshhenny it‚Äôs great time to learning about #largelanguagemodels or #stablediffusion #datascience #machinelearning

Replying to @joshhenny it‚Äôs great time to learning about #largelanguagemodels or #stablediffusion ...

Learning Curves, 11/16/22, TikTok Instagram YouTube

Learning curves, it‚Äôs a technique I use all the time when training models. Thanks to Todd C for showing me the best way to explain this. #datascience #machinelearning #statistics #bigdata. This video is inspired by the many machine learning experts that I had to explain that sampling is a useful and valid technique.

Learning curves, it‚Äôs a technique I use all the time when training models. Thanks to Todd C for sh...

Automatic Speech recognition in 3 lines of code, 11/17/22, TikTok Instagram

Automatic Speech recognition in 3 lines of code using wav2vec2 in transformers #datascience #machinelearning #huggingface #automaticspeechrecognition #asr

Automatic Speech recognition in 3 lines of code using wav2vec2 in transformers #datascience #machine...

Galactica by meta, 11/17/22, TikTok Instagram

Galactica by meta. Cool model, poor form on sharing it out. #datascience #machinelearning I feel for students, it was going to write a lot of papers.

Galactica by meta. Cool model, poor form on sharing it out. #datascience #machinelearning I feel for...

Regularization / Lasso, 11/19/22, TikTok Instagram

I need to focus on adding more Regularization to my life. #datascience #statistics #regularization

Relevance maps for image classification, 11/21/22, TikTok

Relevance maps for image classification. Model explainability is always important. #datascience #explainability #machinelearning #imageclassication

Relevance maps for image classification. Model explainability is always important. #datascience #exp...

object detection in diffusiondet, 11/21/22, TikTok Instagram

Using diffusion for object detection in diffusiondet. #datascience #machinelearning #objectdetection #computervision

Using diffusion for object detection in diffusiondet. #datascience #machinelearning #objectdetection...

Neurips Below Average, 11/23/22, TikTok

From the article: How do Authors‚Äô Perceptions about their Papers Compare with Co-authors‚Äô Perceptions and Peer-review Decisions? #statistics @rajistics

From the article: How do Authors‚Äô Perceptions about their Papers Compare with Co-authors‚Äô Percep...

AI for Diplomacy, 11/23/22, TikTok Instagram

Meta‚Äôs Cicero for playing Diplomacy is impressive and a bit scary. #statistics #datascience #machinelearning #diplomacy

Meta‚Äôs Cicero for playing Diplomacy is impressive and a bit scary. #statistics #datascience #machi...

Missing Data, 11/24/22, TikTok

Missing data happens all the time. Don‚Äôt just jump to dropping rows or using imputation techniques. #dataengineering #statistics #datascience #imputation

Missing data happens all the time. Don‚Äôt just jump to dropping rows or using imputation techniques...

Database Alignment Chart, 11/24/22, TikTok

Saving you a trip to Twitter. #dataengineering #databases There is one big vendor left out. Probably get sued for leaving them out.

Saving you a trip to Twitter. #dataengineering #databases There is one big vendor left out. Probably...

Go learn to build a web demo, 11/25/22, TikTok

Got some time this weekend? Go build a web demo. #datascience #statistics #shinyr #rstats #python #gradio #streamlit

Got some time this weekend? Go build a web demo. #datascience #statistics #shinyr #rstats #python #g...

Stable diffusion 2.0, 11/25/22, TikTok

Stable diffusion 2.0 just dropped and a lot of unhappy people. Who knew giving away software could create so much angst. #datascience #stablediffusion

Stable diffusion 2.0 just dropped and a lot of unhappy people. Who knew giving away software could c...

About Me, 11/28/22, TikTok

Introducing myself, like a year too late. Hope this fills the gaps around this channel.

Explainer Dashboard, 11/29/22, TikTok

A walkthrough of the explainer dashboard. It contains a lot of the tools you want when trying to explain your models. #datascience #machinelearning #statistics #permutationimportance #partialdependence

A walkthrough of the explainer dashboard. It contains a lot of the tools you want when trying to exp...

Evaluating Software Packages, 11/30/22, TikTok

Some general advice on how to evaluate software packages. #datascience #machinelearning #github

OpenAI ChatGPT Intro, 12/1/22, TikTok

This post was based on great stuff on Twitter, especially Ben‚Äôs Bites. I wanted to show the chat output, so wasn‚Äôt able to keep the original tweet info. Go play or read about it. #datascience #machinelearning #openai #chatgpt share back anything cool you find.

This post was based on great stuff on Twitter, especially Ben‚Äôs Bites. I wanted to show the chat o...

Data Engineering, 12/2/22, TikTok

I have no desire to build data infrastructure. I will leave that to my #dataengineer friends. #datascience

I have no desire to build data infrastructure. I will leave that to my #dataengineer friends. #datas...

Visual question answering, 12/3/22, TikTok

Visual question answering (VQA) is another cool task you can do with machine learning. #datascience #machinelearning #visualquestionanswering

Visual question answering (VQA) is another cool task you can do with machine learning. #datascience ...

Data Science Team Meeting, 12/4/22, TikTok

Typical issues that often come up in everyday data science. Data scientists only spend a small amount of time on algorithms. #datascience #machinelearning

Typical issues that often come up in everyday data science. Data scientists only spend a small amoun...

Unfair Comparisons Visualizations, 12/5/22, TikTok

Visualizations for showing variation in the data or uncertainty. Based on Unfair Comparisons by Eli Holder. #datascience #machinelearning #statistics #datavisualization

Visualizations for showing variation in the data or uncertainty. Based on Unfair Comparisons by Eli ...

Data analyst versus data scientist, 12/7/22, TikTok

No one actually knows what a data scientist does, take advantage of it.

Avoiding ChatGPT, 12/7/22, TikTok

ChatGPT has sucked up a lot of my attention. Will do a post soon on how it works.

Explaining ChatGPT, 12/7/22, TikTok

Simply explaining how ChatGPT works. All the technical details of ChatGPT have not been released, so this is based on what OpenAI has been doing over the last few years. #datascience #machinelearning #openai #chatgpt #reinforcementlearning

Simply explaining how ChatGPT works. All the technical details of ChatGPT have not been released, so...

Watermarking AI content, 12/9/22, TikTok

Replying to @urdar635 watermarking output from AI models is something that is being considered. It‚Äôs done by adding some ‚Äúsignal‚Äù to the output of the model. The outputs are probabilities of tokens, so it‚Äôs possible to slightly modify them in a way that wouldn‚Äôt be detectable when reading the output. See aaronson‚Äôs blog for more. #datascience #machinelearning #openai #chatgpt #watermarking

Replying to @urdar635 watermarking output from AI models is something that is being considered. It‚Ä...

Error Analysis, 12/10/22, TikTok LinkedIn

So what did I miss when you do error analysis? #machinelearning #datascience #statistics #erroranalysis

So what did I miss when you do error analysis? #machinelearning #datascience #statistics #erroranaly...

Pytorch adding model compile, 12/10/22, TikTok

Tensorflow fans are probably seething since they were first and ignored. All good and will be easy for pytorch users to take advantage of modern hardware. #machinelearning #tensorflow #pytorch

Tensorflow fans are probably seething since they were first and ignored. All good and will be easy f...

Difficult Training Examples, 12/13/22, TikTok

Reminder to be smart about how you using your training data. #machinelearning #datacentricai #datascience #waymo #reinforcementlearning

Reminder to be smart about how you using your training data. #machinelearning #datacentricai #datasc...

In context learning, 12/14/22, TikTok

In context learning, let‚Äôs dig deeper and let me know what I should do next. #machinelearning #datascience #largelanguagemodels #incontextlearning

In context learning, let‚Äôs dig deeper and let me know what I should do next. #machinelearning #dat...

Tensorflow versus pytorch, 12/15/22, TikTok

Replying to @philosophywithsuf explaining the irony for pytorch building a graph and the history of tensorflow

Replying to @philosophywithsuf explaining the irony for pytorch building a graph and the history of ...

Data science news / resources, 12/16/22, TikTok

Sharing my favorite data science news and resources, find it bit.ly/raj_reads #machinelearning #datascience

Sharing my favorite data science news and resources, find it bit.ly/raj_reads #machinelearning #data...

Who would you hire, 12/17/22, TikTok LinkedIn

The best way to learning data science is working with data. You don‚Äôt need to spend money on courses or books. Spending time doing useful projects. #machinelearning #datascience

The best way to learning data science is working with data. You don‚Äôt need to spend money on cours...

AI filter, 12/17/22, TikTok

#aifilter #aifilterchallenge had to try it out and got a bit more buff

Who would you hire - Part 2, 12/18/22, TikTok

Replying to @rajistics here are two themes I wanted to highlight. The second candidate showed more analytic maturity.

Replying to @rajistics here are two themes I wanted to highlight. The second candidate showed more a...

Audio spectrogram transformer, 12/19/22, TikTok

Audio spectrogram transformer shows how widely we can use #machinelearning #datascience #mlaudio #deeplearning

Audio spectrogram transformer shows how widely we can use #machinelearning #datascience #mlaudio #de...

Point-E, 12/20/22, TikTok YouTube

Point-E from #openai. Generating 3D point clouds from text #datascience #machinelearning

Probing Large Language Models, 12/21/22, TikTok

Highlight great research from #anthropic studying the behavior of large language models. #machinelearning #datascience #largelanguagemodels

Highlight great research from #anthropic studying the behavior of large language models. #machinelea...

A Guide to Better Presentations, 12/23/22, TikTok YouTube

A couple of examples of what not to do and what you should do when presenting your data science results to the business. #datascience #statistics #machinelearning #enterpriseai

A couple of examples of what not to do and what you should do when presenting your data science resu...

Planning to Test YouChat App, 12/24/22, TikTok

YouChat. Looks impressive I will try it out this weekend and let you know.

Optimization TSP, 12/24/22, TikTok YouTube

Quick introduction to optimization and for advanced folks, go run a notebook from gurobi or do the Kaggle Santa challenge. #datascience #machinelearning #optimization #travelingsalesmanproblem #gurobi

Quick introduction to optimization and for advanced folks, go run a notebook from gurobi or do the K...

YouChat and Retrieval Models, 12/26/22, TikTok Instagram YouTube

YouChat and retrieval augmented models. To play around with this, check out haystack from deepset. #datascience #machinelearning #youchat #chatgpt #openai #retrievalaugmentedmodel #questionanswermodel

YouChat and retrieval augmented models. To play around with this, check out haystack from deepset. #...

Webinar on ChatGPT, 12/27/22, TikTok

Staying busy and doing a public talk on Generative AI. It will be about 40 minutes so gives me chance to dive into more details and answer questions. Come join us! Link is http://bit.ly/raj_datahour #webinar #datascience #machinelearning #generativeai #chatgpt

Staying busy and doing a public talk on Generative AI. It will be about 40 minutes so gives me chanc...

Politics of ChatGPT, 12/27/22, TikTok Instagram YouTube

The politics of ChatGPT, it‚Äôs no different than any other technology and is not neutral. If you want a simple explanation of how ChatGTP works check out @rajistics Open source language models have a role here as well. #datascience #machinelearning #chatgpt #openai #technologyethics

The politics of ChatGPT, it‚Äôs no different than any other technology and is not neutral. If you wa...

Dtreeviz 2.0 - Visualizing Decision Trees, 12/28/22, TikTok Instagram YouTube

Dtreeviz 2.0 - Visualizing Decision Trees

Med-Palm - Clinical Large Language Model from Google, 12/29/22, TikTok Instagram YouTube

Applying PaLM to the medical domain by using instruction prompt tuning

GPT takes the Bar Exam (Law), 12/30/22, TikTok Instagram YouTube

GPT3.5 takes the bar exam with very little tuning. It does pretty well. #gpt #datascience #machinelearning #barexam #law

GPT3.5 takes the bar exam with very little tuning. It does pretty well. #gpt #datascience #machinele...

Clustering with K-means, 12/31/22, TikTok Instagram YouTube LinkedIn

Clustering with k-means. This skit was inspired by the examples in Schubert paper on stop using the elbow criterion for kmeans. Any other clustering fails out there? #datascience #statistics #machinelearning #kmeans #clustering

Clustering with k-means. This skit was inspired by the examples in Schubert paper on stop using the ...

2022 Recap and Airtable Link, 1/1/23, TikTok

Looking forward to a lot more videos in 2023, let me know topics I should cover. For all my videos, I put them in an airtable spreadsheet available at bit.ly/raj_videos

Looking forward to a lot more videos in 2023, let me know topics I should cover. For all my videos, ...

Why models cheat, 1/3/23, TikTok Instagram YouTube

Models that cheat, take shortcuts, and leak information are all part of the data scientist life style. Ever my data scientist has a story like this. #datascience #machinelearning

Models that cheat, take shortcuts, and leak information are all part of the data scientist life styl...

Big Bench reasoning benchmark, 1/4/23, TikTok Instagram YouTube

Just how smart is ChatGPT and other #largelanguagemodels? Big Bench is a set of benchmark tests to asses the performance of the models. And the most recent models from Google are doing pretty good! #datascience #machinelearning #chatgpt

Just how smart is ChatGPT and other #largelanguagemodels? Big Bench is a set of benchmark tests to a...

Image captioning models, 1/5/23, TikTok Instagram

Image captioning models - GIT from Microsoft and BLIP from salesforce #datascience #machinelearning #imagecaptioning

Image captioning models - GIT from Microsoft and BLIP from salesforce #datascience #machinelearning ...

Corporate Legal on Software Licensing, 1/6/23, TikTok

A reminder that most enterprises favor Apache and MIT licenses. As a developer, use what you please. But to reach people working within companies, it‚Äôs best to stick to the classic open source licenses. #datascience #machinelearning #softwarelicensing

A reminder that most enterprises favor Apache and MIT licenses. As a developer, use what you please....

Scaling laws, 1/7/23, TikTok Instagram YouTube LinkedIn

Scaling laws help us figure out how manage the amount of training data versus the model size. DeepMind showed with Chinchilla by using more data, you can use a smaller model. This went against the known wisdom from OpenAI‚Äôs research. This is a big deal because lots of resources are spent on building those models. Ask more questions in the comments. #datascience #machinelearning #largelanguagemodels #openai #deepmind #nvidia #microsoft #azure #huggingface #chatgpt

Scaling laws help us figure out how manage the amount of training data versus the model size. DeepMi...

Anthropic Claude, 1/8/23, TikTok YouTube

Anthropic is starting to preview their model and people are comparing it to ChatGPT. Thanks to Riley Goodside for sharing screenshots. It looks pretty impressive. #datascience #machinelearning #largelanguagemodels #anthropic #claude #chatgpt

Anthropic is starting to preview their model and people are comparing it to ChatGPT. Thanks to Riley...

Avoid Overplotting, 1/8/23, TikTok Instagram YouTube

Dealing with over plotting, another visualization tips from data to viz #datascience #machinelearning #statistics #datavisualization

Dealing with over plotting, another visualization tips from data to viz #datascience #machinelearnin...

Evaluating Pass Rushers in Football, 1/13/23, TikTok Instagram YouTube

Big data bowl submissions are going in and lots of great sports analytic work. This one is on strain for evaluating pass rushers. #datascience #statistics #bigdatabowl #nfl

Big data bowl submissions are going in and lots of great sports analytic work. This one is on strain...

Using LangChain with GPT3, 1/14/23, TikTok Instagram YouTube

Using LangChain with GPT3. I am seeing lots of cool demos based on LangChain and needed to make I covered it. It‚Äôs an easy way to take advantage of #largelanguagemodels #datascience #machinelearning #gpt3 #langchain

Using LangChain with GPT3. I am seeing lots of cool demos based on LangChain and needed to make I co...

Shiny in Spaces, 1/15/23, TikTok

Get shiny to run on hugging face spaces (or even some other web app) #huggingface #posit #rstudio #shiny #datascience

Get shiny to run on hugging face spaces (or even some other web app) #huggingface #posit #rstudio #s...

Airbnb Support Utilizes Generative AI, 1/16/23, TikTok

How AIrbnb customer support is using generative AI. This is a great example of how @rajistics in context learning is growing and replacing traditional machine learning approaches for some use cases. #datascience #machinelearning #largelanguagemodels #generativeai #incontextlearning

How AIrbnb customer support is using generative AI. This is a great example of how @rajistics in con...

Picking a GPU for Deep Learning, 1/16/23, TikTok Instagram YouTube

Picking a GPU for deep learning based on Tim Dettmers classic blog post. #datascience #machinelearning #deeplearning #gpu

Picking a GPU for deep learning based on Tim Dettmers classic blog post. #datascience #machinelearni...

GPT4 Hype, 1/16/23, TikTok Instagram YouTube

GPT4 hype that it will be 100 trillion parameters. This doesn‚Äôt make any sense. See the video on scaling laws @rajistics and think about the compute resources for inference. #datascience #machinelearning #openai #gpt4

GPT4 hype that it will be 100 trillion parameters. This doesn‚Äôt make any sense. See the video on s...

Tesla Self Driving Fake, 1/17/23, TikTok

Tesla self driving has been such a scam. I am so disappointed. I really believed that self driving could be pretty useful (I knew it wasn‚Äôt going to be perfect). #tesla #fsd #self-driving #fakedemo

Tesla self driving has been such a scam. I am so disappointed. I really believed that self driving c...

BigCode and Github CoPilot, 1/19/23, TikTok Instagram YouTube

How companies use your data for training models will be a big issue this year. GitHub is being sued for Copilot and Hugging Face has been building out datasets that respect creators. #huggingface #bigcode #github #copilot #datascience

How companies use your data for training models will be a big issue this year. GitHub is being sued ...

Google's Sparrow for Chat, 1/21/23, TikTok Instagram YouTube

Google‚Äôs sparrow is the rumored competitor to OpenAI ChatGPT. Check out the paper to see lots of examples of it chatting. It looks really good! #datascience #machinelearning #chatgpt #openai #google #googlesparrow #largelanguagemodels

Google‚Äôs sparrow is the rumored competitor to OpenAI ChatGPT. Check out the paper to see lots of e...

Learn Machine Learning, 1/21/23, TikTok

Google Colab, Kaggle, and LangChain are all great ways to start learning this weekend! #datascience #machinelearning #kaggle #googlecolab #langchain

Google Colab, Kaggle, and LangChain are all great ways to start learning this weekend! #datascience ...

Instructor - Embeddings Model, 1/22/23, TikTok Instagram YouTube LinkedIn

New state of the art embedding model, Instructor, for text is available! It accounts for task and domain when creating an mending. #datascience #machinelearning #embeddings #word2vec #sentencetransformers #huggingface

New state of the art embedding model, Instructor, for text is available! It accounts for task and do...

Kubernetes for Data Scientists, 1/23/23, TikTok Instagram YouTube

Should you take the time to learn Kubernetes as a data scientist? Or you already overloaded learning data science? #datascience #machinelearning #kubernetes

Should you take the time to learn Kubernetes as a data scientist? Or you already overloaded learning...

Using Synthetic Datasets, 1/25/23, TikTok Instagram YouTube

Synthetic datasets have given me a way to understand better how to do feature selection and model explainability. Try it out sometime. #datascience #machinelearning #syntheticdata #explainability

Synthetic datasets have given me a way to understand better how to do feature selection and model ex...

GPT-3 versus FinBERT, 1/26/23, TikTok Instagram YouTube LinkedIn

GPT-3 is powerful, but sometimes domain-specific models will do better. Pick the right tool for the job. #datascience #machinelearning #huggingface #chatgpt #openai #gpt3 #finbert

GPT-3 is powerful, but sometimes domain-specific models will do better. Pick the right tool for the ...

In-Context Learning for LLMs, 1/27/23, TikTok Instagram YouTube

My second try to explain in context learning or few shot learning with large language models. It‚Äôs very cool and why these models are so exciting. My older video is here @rajistics #datascience #machinelearning #gpt3 #largelanguagemodels #fewshotlearning #incontextlearning

My second try to explain in context learning or few shot learning with large language models. It‚Äôs...

Efficient Large Language Model training with LoRA and Hugging Face PEFT, 1/27/23, YouTube LinkedIn

Based on:

DeepMind Submits a Paper to NeurIPS, 1/28/23, TikTok Instagram YouTube

Corporate research labs have changed academic work with their reluctance to provide reproducible research and getting around blind peer review. No answers from me, but want you all to be aware. #datascience #machinelearning #neurips #reproducibility

Corporate research labs have changed academic work with their reluctance to provide reproducible res...

Meta's Chatbot Fails, User Jealous, 1/28/23, TikTok

#greenscreenvideo Jealous. Go see how bad Meta bungled their chatbot @rajistics

Label Errors and CleanLab, 1/29/23, TikTok YouTube

Cleanlab is open source and will improve your data quality. It‚Äôs so underrated. This was hard to record vertically, so go try it out. #datascience #machinelearning #cleanlab #labelerror #confidentlearning #dataquality

Cleanlab is open source and will improve your data quality. It‚Äôs so underrated. This was hard to r...

Cheating on Kaggle, 1/30/23, TikTok Instagram YouTube

Cheating has reared its head again over at Kaggle. Some background for folks on Kaggle and cheating there. #datascience #machinelearning #kaggle #ottocompetition

Cheating has reared its head again over at Kaggle. Some background for folks on Kaggle and cheating ...

OpenAI AI Classifier, 1/31/23, TikTok Instagram YouTube

I can‚Äôt make this stuff up. OpenAI released their classifier and I saw all these messages about how ineffective it is. Wanted to get this news out. #datascience #machinelearning #openai I am just having fun here, so let‚Äôs not get too worked up over my jokes.

I can‚Äôt make this stuff up. OpenAI released their classifier and I saw all these messages about ho...

Hands on Reasoning with AI, 1/31/23, TikTok Instagram YouTube

Try out these examples for yourself and lots more are available. It‚Äôs scary cool how these models are working. #datascience #machinelearning #gpt3 #largelanguagemodels #flanT5 #reasoningwithpeople https://huggingface.co/spaces/osanseviero/i-like-flan

Try out these examples for yourself and lots more are available. It‚Äôs scary cool how these models ...

Enjoyed GPT-3 Trivia, French Pastries, 2/1/23, TikTok

GPT- 3 trivia and French pastries I enjoyed at the ü§ó offsite. #datascience #machinelearning #gpt3 #openai #huggingface

GPT- 3 trivia and French pastries I enjoyed at the ü§ó offsite. #datascience #machinelearning #gpt3...

Data Distributions for Modeling, 2/3/23, TikTok YouTube

Some common data distributions when modeling including skewed and zero inflated. There are many other distributions, but just wanted people to know that normal distribution isn‚Äôt normal in my experience. #datascience #statistics #datadistribution #zeroinflated #tweedie

Some common data distributions when modeling including skewed and zero inflated. There are many othe...

Limits of AI Content Detection, 2/4/23, TikTok Instagram YouTube LinkedIn

OpenAI AI classifier is a great example to remind people of the limitations when detecting rare events. It‚Äôs not intuitive, so I showed the math and need you all to get it. This happens in many contexts like detecting terrorizes or diseases. #datascience #statistics #openai #baseratefallacy

OpenAI AI classifier is a great example to remind people of the limitations when detecting rare even...

Seven Stage of ChatGPT, 2/5/23, TikTok Instagram

How enterprises are dealing with ChatGPT it‚Äôs a pretty familiar cycle of grief. The good thing is it does open up lots of cool use cases. #datascience #machinelearning #chatgpt #enterprisearchitecture

How enterprises are dealing with ChatGPT it‚Äôs a pretty familiar cycle of grief. The good thing is ...

Google's Bard, 2/6/23, TikTok Instagram YouTube

Google announced Bard, but we still don‚Äôt know much. It has been based on Lambda which has been around for a while. This is a safe bet, not a daring move. #datascience #machinelearning #largelanguagemodels #chatgpt #google

Google announced Bard, but we still don‚Äôt know much. It has been based on Lambda which has been ar...

Climax, 2/7/23, TikTok Instagram YouTube

Climax, a new transformer based model for predicting weather and climate forecasting. Great example of the flexibility of transformers based approaches. #datascience #machinelearning #transformers #climatemodel

Climax, a new transformer based model for predicting weather and climate forecasting. Great example ...

SpeechT5 Audio Models, 2/8/23, TikTok Instagram YouTube

SpeechT5 audio models getting added to transformers. #datascience #machinelearning #huggingface #speecht5 #speechmodels #audiomodels

SpeechT5 audio models getting added to transformers. #datascience #machinelearning #huggingface #spe...

Using Histograms, 2/9/23, TikTok Instagram YouTube

Histograms are a great visualization tool. Here are some caveats and tips for using histograms. #datascience #statistics #datavisualization #histogram

Histograms are a great visualization tool. Here are some caveats and tips for using histograms. #dat...

Dumb, Dumber, and Dumbest (Feb 2023), 2/10/23, TikTok Instagram YouTube

Roundup of this weeks news, let me know if you all like this format. I had a lot of fun making this. #datascience #machinelearning #dumbtechnews #openai #google #microsoft #stabilityai #meta #apple

Roundup of this weeks news, let me know if you all like this format. I had a lot of fun making this....

Curse of Dimensionality, 2/11/23, TikTok Instagram YouTube

urse of dimensionality reminds us to think carefully about feature selection. More isn‚Äôt always better. Use a feature selection curve. #datascience #machinelearning #curseofdimensionality #featureselection

urse of dimensionality reminds us to think carefully about feature selection. More isn‚Äôt always be...

Variables in Auto Insurance, 2/12/23, TikTok YouTube

Replying to @rajistics as promised, the feature or variables in auto insurance models. Keep the feedback coming. #datascience #machinelearning #autoinsurance #acturialscience earlier video on insurance @rajistics

Replying to @rajistics as promised, the feature or variables in auto insurance models. Keep the feed...

Toolformer, 2/13/23, TikTok Instagram YouTube LinkedIn

Toolformer from Meta shows the possibilities of using APIs in an unsupervised way. #datascience #machinelearning #toolformer #largelanguagemodels

Toolformer from Meta shows the possibilities of using APIs in an unsupervised way. #datascience #mac...

Replika and Erotic Role Play, 2/14/23, TikTok Instagram YouTube

Replika and the growth of these character chatbots or socialbots is emerging as a big use case within generative AI. Here is a recent controversy over the loss of the erotic role play (erp) functionality. #datascience #gpt3 #chatbot #socialbot #replika

Replika and the growth of these character chatbots or socialbots is emerging as a big use case withi...

Text to Chart, 2/15/23, TikTok Instagram YouTube

Text to Chart. It‚Äôs easier than ever to build great charts using libraries like plotly or matplotlib. Are other people using ChatGPT for this? #datascience #machinelearning #chatgpt #matplotlib #plotly #python #stackoverflow

Text to Chart. It‚Äôs easier than ever to build great charts using libraries like plotly or matplotl...

X-Deoder Vision/Language Model, 2/16/23, TikTok Instagram YouTube

X-decoder from Microsoft. Check out the instructional text demo. I added in video released by the team at the bottom. If too many people don‚Äôt like that, I can release a version without that video. #datascience #machinelearning #x-decoder #pix2pix

X-decoder from Microsoft. Check out the instructional text demo. I added in video released by the te...

Chatbot Concerns, 2/16/23, TikTok Instagram YouTube

Wrap up of current events going on with chat including #openai #chatgpt #bing #amazon #datascience #machinelearning

Wrap up of current events going on with chat including #openai #chatgpt #bing #amazon #datascience #...

Random Forest History, 2/18/23, TikTok Instagram YouTube

Random forests and their ease of use are important in understanding modern data science. #datascience #machinelearning #statistics #randomforest #dataprep #decisiontree #fortran

Random forests and their ease of use are important in understanding modern data science. #datascienc...

OpenAssistant: Open Source ChatGPT, 2/19/23, TikTok Instagram YouTube

Replying to @anansaadi OpenAssistant is an open source project that aims to provide a chat based assistant that connects to other sources of information. It‚Äôs great to see these open source projects, but just know they are very early in the development cycle. #datascience #machinelearning #openai #chatgpt #openassistant

Replying to @anansaadi OpenAssistant is an open source project that aims to provide a chat based ass...

GPT-4 Speculation, 2/20/23, TikTok Instagram YouTube

Speculating on GPT-4 size and performance. #datascience #machinelearning #gpt3 #gpt4

How to beat AI, 2/21/23, TikTok Instagram YouTube

AI only knows what's it's trained on. So beat it by doing something new. The video shows recent examples of marines beating a surveillance system and beatiing a computer playing go. As a reminder, any production machine learning model should be monitoried to catch any data shifts. #datascience #machinelearning #modelmonitoring #datadrift

AI only knows what's it's trained on. So beat it by doing something new. The video shows recent exam...

Control Robots with ChatGPT, 2/22/23, TikTok Instagram YouTube

ChatGPT for Robotics is the latest hot paper. Large language models are the future interface. #datascience #machinelearning #largelanguagemodels #chatgpt #microsoft #robotics #promptcraft

ChatGPT for Robotics is the latest hot paper. Large language models are the future interface. #datas...

Data Centric AI, 2/23/23, TikTok Instagram YouTube LinkedIn

Data Centric AI helps to remind us not to focus too much on the model or algorithms. In real data science, it‚Äôs more about understanding your data and having high quality labeled training data. #datascience #machinelearning #datacentricai #cleanlab #erroranalysis

Data Centric AI helps to remind us not to focus too much on the model or algorithms. In real data sc...

RFM and Machine Learning in CLV, 2/25/23, TikTok

Customer lifetime value is a common data science use case. There are many ways to calculate this but here I introduce the class RFM method and a machine learning alternative. #datascience #machinelearning #rfm #customerlifetimevalue #marketinganalytics

Customer lifetime value is a common data science use case. There are many ways to calculate this but...

Customer LifeTime Value, 2/25/23, TikTok Instagram YouTube

https://www.tiktok.com/@rajistics/video/7204141835965500715

Composer - Generative AI, 2/26/23, TikTok Instagram YouTube

Composer will be sharing their new generative AI models and they look amazing. They key is they decompose the image, which then provides a lot more flexibility for creating new images. #datascience #machinelearning #stablediffusion #composer #generativeai

Composer will be sharing their new generative AI models and they look amazing. They key is they deco...

Feature Engineering, 2/27/23, TikTok Instagram YouTube

Feature engineering and data preprocessing are an important part of the machine learning process. #datascience #machinelearning #featureengineering

Feature engineering and data preprocessing are an important part of the machine learning process. #d...

Pandas 2.0, 3/1/23, TikTok Instagram YouTube

Pandas 2.0 combing with arrow. A short recap on how it fits in with polars, dplyr, and data.table. #datascience #machinelearning #rstats #python #pandas #polars #dplyr #datatable

Pandas 2.0 combing with arrow. A short recap on how it fits in with polars, dplyr, and data.table. #...

ChatGPT Updates, 3/2/23, TikTok Instagram YouTube

ChatGPT price drop. Let‚Äôs break down how much the price dropped, how OpenAI could drop the price, the effects on performance, what is going on with langchain, and the open source contenders. #datascience #machinelearning #chatgpt #openai #cohere #anthropic #flant5 #langchain

ChatGPT price drop. Let‚Äôs break down how much the price dropped, how OpenAI could drop the price, ...

AI News March 2023, 3/3/23, TikTok Instagram YouTube

Roundup of all the big headlines, hope this is fun for you all. I laugh while making these, but wonder how many of you get all the refeenences. #datascience #machinelearning #openai #google #meta #stabilityai #elonmusk #apple #google

Roundup of all the big headlines, hope this is fun for you all. I laugh while making these, but wond...

LangChain Agent, 3/4/23, TikTok Instagram YouTube

Using agents in langchain with gpt-3. You can do this! Go check it out. #datascience #machinelearning #openai #gpt3 #langchain

Using agents in langchain with gpt-3. You can do this! Go check it out. #datascience #machinelearnin...

Open Sourcing Large Language Models, 3/5/23, TikTok Instagram YouTube

Meta‚Äôs less than open source model and some bad takes from Twitter. #datascience #machinelearning #largelanguagemodels #opensource #meta

Meta‚Äôs less than open source model and some bad takes from Twitter. #datascience #machinelearning ...

Tips for Small Datasets, 3/6/23, TikTok Instagram YouTube

Working with small datasets. Several tips including using crossvalidation, models like lasso, and running multiple interations with different random seeds. #datascience #machinelearning #crossvalidation #elasticnet #lasso #randomseed

Working with small datasets. Several tips including using crossvalidation, models like lasso, and ru...

Tips for Managing Small Datasets, 3/7/23, TikTok Instagram

Working with small datasets. Several tips including using crossvalidation models like lasso and running multiple interations with different random seeds. #datascience #machinelearning #crossvalidation #elasticnet #lasso #randomseed

Working with small datasets. Several tips including using crossvalidation models like lasso and runn...

Word as Image, 3/7/23, TikTok Instagram YouTube

Word as Image - great use of generative AI models like stable diffusion to create fonts. Check out the paper at wordasimage.github.io #datascience #machinelearning #stablediffusion #generativeai #fonts

Word as Image - great use of generative AI models like stable diffusion to create fonts. Check out t...

Best Machine Learning Tools, 3/9/23, TikTok Instagram YouTube

Best machine learning tools for competitions. Lots of great stuff here. #datascience #machinelearning #python #codetok

Best machine learning tools for competitions. Lots of great stuff here. #datascience #machinelearnin...

Nat.dev LLM Playground, 3/10/23, TikTok Instagram YouTube

Nat.dev playground is awesome. Should be a great reminder of the diversity of large language models. #datascience #machinelearning #largelanguagemodels #natdev #gpt3

Nat.dev playground is awesome. Should be a great reminder of the diversity of large language models....

GPT-3 in the Enterprise, 3/11/23, TikTok Instagram YouTube LinkedIn

Starting to see people productionizing GPT-3 workflows. I am a bug fan of using large language midels. Here is how one data science dealt with GPT3. #datascience #machinelearning #largelanguagemodels #gpt3

Starting to see people productionizing GPT-3 workflows. I am a bug fan of using large language midel...

Uber D3 Automated Data Drift, 3/13/23, TikTok Instagram YouTube

Data drift analysis is a must for production workloads. Here is Uber‚Äôs D3 system fie automated drift analysis. This video covers types of data drift issues, different approaches for addrssing them, and Ubers use of a Prophet model for anomaly detection. #datascience #machinelearning #mlops #datadrift #prophet

Data drift analysis is a must for production workloads. Here is Uber‚Äôs D3 system fie automated dri...

Ensembling: Better Decisions with Majority Voting, 3/14/23, TikTok Instagram YouTube LinkedIn

Ensembling is key method in machine learning. This video introduces ensembling through majority voting. #datascience #machinelearning #ensembling #kaggle #majorityvoting

Ensembling is key method in machine learning. This video introduces ensembling through majority voti...

Prismer, 3/15/23, TikTok Instagram YouTube

Nvidia Prismer model for image captioning and zero shot visual question answering. It uses and ensemble or mixture of experts approach. #datascience #machinelearning #nvidia #prismer #imagecaptioning #visualquestionanswering

Nvidia Prismer model for image captioning and zero shot visual question answering. It uses and ensem...

Langflow: UI for LangChain, 3/17/23, TikTok Instagram YouTube

I think langchain is aweome, but the future is an easy to use UI. Think Alteryx for LLMs. Langflow is a step in the right direction. #datascience #machinelearning #largelanguagemodels #gpt4 #langchain #langflow

I think langchain is aweome, but the future is an easy to use UI. Think Alteryx for LLMs. Langflow i...

Open Source or OpenAI?, 3/18/23, TikTok Instagram YouTube

Lets talk about why enterprises are considering alternatives to chatGPT by looking to open source. An open source strategy can affect lots of areas outside data science including data goverance, IT, legal, and accounting. Let me know what else i missed.

Lets talk about why enterprises are considering alternatives to chatGPT by looking to open source. A...

Pair Programming with ChatGPT, 3/19/23, TikTok Instagram YouTube

Pair programming is some of my favorite times as a data scientist. I am starting to use ChatGPT to fill that role lately. Its useful for me. #datascience #machinelearning #pairprogramming #chatgpt #codex

Pair programming is some of my favorite times as a data scientist. I am starting to use ChatGPT to f...

Temperature for LLMs, 3/21/23, TikTok Instagram YouTube

Temperature is an important parameter when working with many models including got-3. This video gives a short background on temperature and the best settings when working with large language models. #datascience #machinelearning #largelanguagemodels #gpt3 #gpt4

Temperature is an important parameter when working with many models including got-3. This video give...

ChatDoctor, 3/22/23, TikTok Instagram YouTube

ChatDoctor is a great example of fine tuning a large language model to get more factually correct output. This is an approach i expect many people to follow. #datascience #machinelearning #largelanguagemodels #chatgpt #chatdoctor #finetuning #instructiontuning

ChatDoctor is a great example of fine tuning a large language model to get more factually correct ou...

OpenAI Plugins, 3/23/23, TikTok Instagram YouTube

OpenAI plugins! Lets get everyones APIs working with LLMs! This isa good thing. #largelanguagemodels #langchain #openai #datascience #machinelearning #chatgpt

OpenAI plugins! Lets get everyones APIs working with LLMs! This isa good thing. #largelanguagemodels...

Scandals in AI, 3/25/23, TikTok Instagram YouTube LinkedIn

My take on Objaverse Llama and Alpaca. Not a lot of respect for copyright or contract terms. #largelanguagemodels #datascience #machinelearning #objaverse #alleninstitute #openai #chatgpt #llama #alpaca #meta #databricks

My take on Objaverse Llama and Alpaca. Not a lot of respect for copyright or contract terms. #largel...

Text2Video, 3/26/23, TikTok Instagram

Text to video models including text2video. The models are grtting better and there is now a place over at the hugging face hub to find them. #datascience #machinelearning #text2video #stablediffusion #huggingface

Text to video models including text2video. The models are grtting better and there is now a place ov...

Use PEFT and LoRA to efficiently train LLMs, 3/27/23, TikTok Instagram

Short summary of my longer video on efficiently training a large language model using PEFT and LoRA. #datascience #machinelearning #largelanguagemodels #flant5 #peft #LoRA #finetuning

Short summary of my longer video on efficiently training a large language model using PEFT and LoRA....

Emily Ocasio Media Coverage AI Model, 3/29/23, TikTok Instagram YouTube LinkedIn

Explaining how Emily Ocasio won second place with her project analyzing media coverage. I like her approach and highlights a growing trend of using prompting in data science. #datascience #machinelearning #promptengineering #societyforscience #emilyocasio

Explaining how Emily Ocasio won second place with her project analyzing media coverage. I like her a...

Vicuna and Open AI, 3/30/23, TikTok Instagram YouTube

Vicuna is awesome go check it out. Its the latest LLama model and very impressive. I ended up cutting out the details on vicuna since i feel like we have turned the corner on getting GPT-3 performance with open source models. #datascience #machinelearning #llama #vicuna #openai #gpt3 #largelanguagemodels

Vicuna is awesome go check it out. Its the latest LLama model and very impressive. I ended up cuttin...

Twitter Open Source Recommender, 3/31/23, TikTok Instagram YouTube

Twitter open sourced it's recommendation algorithm. It's fun to look at someone else's production code and will be useful to people studying recommender systems. But a lot of the important pieces aren't provided and there doesn't seem to be anything earthshattering or unexpected here. #datascience #machinelearning #twitter #recommenders

Twitter open sourced it's recommendation algorithm. It's fun to look at someone else's production co...

GPT-4's Impact on Enterprise Analytics, 4/1/23, TikTok

Lets talk about how GPT-4 is going to affect enterprise analytics. My upcoming public talks: AI Summit in Montreal on April 20 & Arize AI event on April 25. #datascience #machinelearning #openai #gpt4 #analytics

Lets talk about how GPT-4 is going to affect enterprise analytics. My upcoming public talks: AI Summ...

SVD: Essential for Recommender Systems, 4/2/23, TikTok

Singular value decomposition is one of many low rank methods when working with matrices. This video shares the intuition for why SVD matters and why it's so widely used in recommender systems working with text and images and even large language models. The linear algebra class is useful. #datascience #machinelearning #svd #singularvaluedecomposition #matrixalgebra

Singular value decomposition is one of many low rank methods when working with matrices. This video ...

Improving LLMs with Retrieval Augmented Tools, 4/4/23, TikTok

Retrieval Augmented approaches are a great way to improve your LLMs. Deepset shown in this video provides a set of tools but there are many others out there like Llama-Index that offer similar functionality. This is one of the most popular use cases with LLMs. #datascience #machinelearning #largelanguagemodels #deepset #llamaindex #retrievalaugmentedmodel

Retrieval Augmented approaches are a great way to improve your LLMs. Deepset shown in this video pro...

Opus.ai's Santacoder Demo and Tutorial, 4/5/23, TikTok

Opus.ai very cool demo! If you want to build similar apps check out the text to code models. Santacoder is open source and they have shared all the details about training it. #datascience #machinelearning #largelanguagemodels #texttocode #santacoder #bigcode #huggingface #opusai

Opus.ai very cool demo! If you want to build similar apps check out the text to code models. Santaco...

Meta's SAM Revolutionizes Computer Vision, 4/6/23, TikTok Instagram

Segment Anything (SAM) is a new segmentation model from Meta. It's a huge improvement over the state of the art and is going to change computer vision. Check it out at: https://segment-anything.com/ #datascience #machinelearning #computervision #imagesegmentation #segmentanything #meta See me this month at the: AI Summit in Montreal on April 20 & Arize AI event on April 25

Segment Anything (SAM) is a new segmentation model from Meta. It's a huge improvement over the state...

Importance of Baseline Models and Benchmark Datasets, 4/7/23, TikTok Instagram

Baseline models and benchmark datasets are important concepts when working in machine learning and data science. Make sure you build a baseline model early in your project and keep benchmark datasets for important problems. #datascience #machinelearning #baselinemodel #benchmarkdataset #practicaldatascience

Baseline models and benchmark datasets are important concepts when working in machine learning and d...

Advancements and Varieties in Language Models, 4/8/23, TikTok Instagram

Language Models like ChatGPT can be modified by several methods including Prompting Instruction Fine-Tuning and Reinforcement Learning with Human Feedback. This year we will start seeing lots more varieties of large language chat models trained on different data. #datascience #machinelearning #largelanguagemodels #openai #chatgpt #promptengineering #instructionfinetuning #rlhf #reinforcementlearning #pretrain References: Conservatives Aim to Build a Chatbot of Their Own: https://www.nytimes.com/2023/03/22/business/media/ai-chatbots-right-wing-conservative.html ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge - https://arxiv.org/abs/2303.14070 Whose Opinions Do Language Models Reflect? https://arxiv.org/pdf/2303.17548.pdf Natural Language Processing with Deep Learning https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf

Language Models like ChatGPT can be modified by several methods including Prompting Instruction Fine...

Target Leakage Issues in CrowdAI Dataset, 4/10/23, TikTok

Target leakage in the CrowdAI dataset. Target leakage is a very common problem and everyone should understand it. I have seen even the smartest people and best teams have issues with data or target leakage. These include Harvard Google Fast.AI Andrew Ng and the SARCOS dataset used by hundreds. #datascience #machinelearning #targetleakage #dataleakage #crowdai #fastai #sarcos Efficient Deduplication and Leakage Detection in Large Scale Image Datasets with a focus on the CrowdAI Mapping Challenge Dataset - https://arxiv.org/abs/2304.02296# Running Code and Failing Models by Rajiv - https://www.datarobot.com/blog/running-code-and-failing-models/ Stand Up for Best Practices: Misuse of Deep Learning in Nature‚Äôs Earthquake Aftershock Paper https://towardsdatascience.com/stand-up-for-best-practices-8a8433d3e0e8 Reddit: https://www.reddit.com/r/MachineLearning/comments/c4ylga/d_misuse_of_deep_learning_in_nature_journals/ Older video: https://www.youtube.com/watch?v=NaySLPTCgDM

Target leakage in the CrowdAI dataset. Target leakage is a very common problem and everyone should u...

Generative Tools Advancing AI Task Solutions, 4/12/23, TikTok Instagram

So much going on around using generative tools for reasoning with tasks. HuggingGPT or Jarvis is focused on helping on solving AI tasks. AutoGPT allows you to select your own task the video shows another service AutoAgent that works similarly. Generative agents shows how GPT4 can simulate human like behavior. #datascience #machinelearning #gpt3 #openai #hugginggpt #jarvis #autogpt #sims Let me know which one I should dig deeper into JARVIS / HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace: https://arxiv.org/pdf/2303.17580.pdf https://github.com/microsoft/JARVIS AutoGPT: https://github.com/Torantulino/Auto-GPT Generative Agents: Interactive Simulacra of Human Behavior: https://arxiv.org/abs/2304.03442

So much going on around using generative tools for reasoning with tasks. HuggingGPT or Jarvis is foc...

Scaling Laws for Smaller Accurate Models, 4/13/23, TikTok Instagram

Using scaling laws to help us getter smaller models with the same accuracy! Based on blog post by de Vries. #datascience #machinelearning #largelanguagemodels #scalinglaws #chinchilla Go smol or go home: https://www.harmdevries.com/post/model-size-vs-compute-overhead/ Scaling Laws for Neural Language Models: https://arxiv.org/abs/2001.08361 Training Compute-Optimal Large Language Models: https://arxiv.org/abs/2203.15556 Scaling Laws Video: https://www.youtube.com/watch?v=NvgNI3waAy4

Using scaling laws to help us getter smaller models with the same accuracy! Based on blog post by de...

Meta's Animated Drawings: Fun, Innovative, 4/14/23, TikTok Instagram

Animated Drawings is really fun model from Meta. It can take a sketch drawing and then animate it. Great example of combining several image models together. #datascience #machinelearning #animateddrawings #meta #FAIRAnimatedDrawings #imageclassification #imagesegmentation #posedetection Demo: https://sketch.metademolab.com/ Code: https://github.com/facebookresearch/AnimatedDrawings Paper: https://arxiv.org/pdf/2303.12741.pdf Project website: http://www.fairanimateddrawings.com/

Animated Drawings is really fun model from Meta. It can take a sketch drawing and then animate it. G...

Hustle, Wait, Laugh Amidst Nonresponses, 4/14/23, TikTok

#duet with @the.rachel.woods #rachelwoods Go hustle but don‚Äôt take it personally when they dont respond. Instead wait your time. And then embarrass yourself in public for 10X. üòÄ

#duet with @the.rachel.woods #rachelwoods Go hustle but don‚Äôt take it personally when they dont re...

Dolly: Databricks' Commercial Open Source Model, 4/15/23, TikTok

Dolly from Databricks is an open source fine tuned instruction large language model that can be used for commercial uses! Databricks has take the time to share the dataset and training scripts its going to be a great place to get started. #datascience #machinelearning #largelanguagemodels #databricks #dolly #instructionfinetuning #finetuning References: Dolly blog post: https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm Dolly at Hugging Face: https://huggingface.co/databricks Dolly Github: https://github.com/databrickslabs/dolly

Dolly from Databricks is an open source fine tuned instruction large language model that can be used...

Growing Importance of Vector Databases, 4/16/23, TikTok Instagram

As we store more information as vectors or embeddings vector databases are gaining importance. For small amounts of embeddings numpy or FAISS might work ok. As your needs grow there are many vector databases from vendors like PineCone Chroma Weaviate and Milvus. #datascience #machinelearning #embeddings #vectordatabase #pinecone #chroma #weaviate #milvus #faiss

As we store more information as vectors or embeddings vector databases are gaining importance. For s...

MiniGPT-4: Advanced Multimodal Machine Learning Model, 4/17/23, TikTok Instagram

MiniGPT-4 brings us a multimodal model! It consists of a vision encoder with a pretrained ViT and and an advanced Vicuna large language model. This gives us the ability to do things like ask questions about a photo. #datascience #machinelearning #minigpt4 #gpt4 #gpt-4 #openai #multimodal #flamingo Website: https://minigpt-4.github.io/ Paper: https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPT_4.pdf Code: https://github.com/Vision-CAIR/MiniGPT-4 Background Pic from eugenia loli : https://www.pinterest.com/pin/457256168397962646/

MiniGPT-4 brings us a multimodal model! It consists of a vision encoder with a pretrained ViT and an...

Flashback Post: Reinforcement Learning Discussion, 4/18/23, TikTok

#onthisday a flashback post - let me know if you all like these. Reinforcement learning here. @rajistics

#onthisday a flashback post - let me know if you all like these. Reinforcement learning here. @rajis...

OpenAI Scrutinized, Reddit Restricts, Alternative Dataset Emerges, 4/20/23, TikTok Instagram

Examining the data used for training our our LLMs. OpenAI is running into trouble in Europe since it won't disclose exactly what was used for their training data. Reddit is no longer letting anyone use their data for commercial use for free. Finally the folks over at together.xyz have assembled an open-source recipe of the LLaMa training dataset. #datascience #machinelearning #largelanguagemodels #openai #openaiban #chatgpt #reddit #together.xyz #redpajama #llama RedPajama: https://github.com/togethercomputer/RedPajama-Data Open AI in Europe: https://www.govtech.com/products/openais-data-practices-cause-it-problems-in-europe Reddit Ban: https://arstechnica.com/information-technology/2023/04/reddit-will-start-charging-ai-models-learning-from-its-extremely-human-archives/

Examining the data used for training our our LLMs. OpenAI is running into trouble in Europe since it...

Python Optimal Transport Library Overview, 4/23/23, TikTok

Python Optimal Transport is an open source Python library providing several solvers for optimization problems related to Optimal Transport for signal image processing and machine learning. Walking through a simple example using earth movers distance (EMD) and then moving to the Sinkhorn Knopp Algorithm. You can see examples of the cost matrix and the effects of regularization. #datascience #machinelearning #optimization #sinkhornknopp #pythonoptimaltransport #earthmoversdistance #regularization Python Optimal Transport: https://pythonot.github.io/ Background Photo by Curioso Photography: https://www.pexels.com/photo/aerial-view-of-white-buildings-343696/

Python Optimal Transport is an open source Python library providing several solvers for optimization...

Interactive Sklearn Examples on Spaces, 4/24/23, TikTok Instagram

Spaces gives you great interactive demos of many popular sklearn examples. It's a great place to browse and even contribute back by add more. #datascience #machinelearning #sklearn #scikit #huggingface All the sklearn documentation spaces are here: https://huggingface.co/sklearn-docs Anomaly Detection: https://huggingface.co/spaces/sklearn-docs/anomaly-detection Visualizatin the Stock Market: https://huggingface.co/spaces/sklearn-docs/Visualizing_the_stock_market_structure Dimensionality Reduction: https://huggingface.co/spaces/sklearn-docs/MNIST-Dimensionality-Reduction Decision surfaces: https://huggingface.co/spaces/sklearn-docs/ensemble-trees-decision-surface Photo by üá∏üáÆ Janko Ferliƒç https://unsplash.com/photos/sfL_QOnmy00

Spaces gives you great interactive demos of many popular sklearn examples. It's a great place to bro...

Exploring ChatGPT's Potent Data Analysis, 4/25/23, TikTok Instagram

Doing data analysis with large language models like ChatGPT. It's going to be amazing as these technologies let us combine our data text and code understanding. #datascience #machinelearning #chatgpt #openai #dataanalysis The inside story of ChatGPT's astonishing potential: https://www.ted.com/talks/greg_brockman_the_inside_story_of_chatgpt_s_astonishing_potential/c Photo By Bruce Hong: https://unsplash.com/photos/OI8YnODoWms

Doing data analysis with large language models like ChatGPT. It's going to be amazing as these techn...

Entropy's Role in Machine Learning, 4/27/23, TikTok Instagram

Entropy can be a useful measure in machine learning. Entropy and information gain is used in building decision trees. I have also seen entropy used in feature engineering. Here is a short conceptual understanding for entropy. This video is based on the excellent blog post on entropy (that also provides the math) #datascience #machinelearning #featureengineering #informationgain #entropy #decisiontrees Background Photo: Hans-Peter Gauster https://unsplash.com/photos/3y1zF4hIPCg Entropy: How Decision Trees Make Decisions by Sam T: https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8

Entropy can be a useful measure in machine learning. Entropy and information gain is used in buildin...

Enhancing Language Models: Techniques Explained, 4/28/23, TikTok Instagram

Deep dive on how to improve large language models. I provide an introduction to zero-shot and few-shot learning methods. I also discuss the role of in-context learning and emergence. For fine-tuning the video explains instruction tuning reinforcement learning with human feedback (rlhf) reinforcement learning with AI feedback (rlaif and parameter efficient fine tuning (peft). I will also have a larger version of this video on my youtube where it's easier to see the slides. #datascience #machinelearning #largelanguagemodels #finetuning #prompting #peft #rlhf #rlaif #fewshotlearning Background Photo from Deepmind: https://unsplash.com/photos/4QVqSh4VvP4 See the full presentation which included this topic: https://youtu.be/dKBD-3hnjW0

Deep dive on how to improve large language models. I provide an introduction to zero-shot and few-sh...

LLaMA and LAOIN Face Licensing Issues, 4/29/23, TikTok Instagram

Models and datasets have specific definitions. Models consist of at least two licenses nowadays this has been an issue for LLaMA where the code for the model architecture and weights are differently licensed. Similarly for datasets the LAOIN dataset has faced criticism because it deflects responability by referring to itself as an index by researchers. #datascience #machinelearning #laoin #llama #copyright LLama license issue: https://github.com/facebookresearch/llama/pull/234 LAOIN copyright: https://www.vice.com/en/article/pkapb7/a-photographer-tried-to-get-his-photos-removed-from-an-ai-dataset-he-got-an-invoice-instead

Models and datasets have specific definitions. Models consist of at least two licenses nowadays this...

LLMs: Shaping Future of Data Science, 4/30/23, TikTok Instagram

Best practices for prompting is emerging. A couple of simple rules is starting with a API based LLM and focus on building good prompts. This new approach is going to reduce the need for traditional NLP models. #datascience #machinelearning #largelanguagemodels #promptengineering #openai #nlp Harnessing LLMs by Peter Bull - https://www.linkedin.com/pulse/harnessing-llms-part-i-peter-bull Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond: https://arxiv.org/pdf/2304.13712.pdf

Best practices for prompting is emerging. A couple of simple rules is starting with a API based LLM ...

MLCopilot: Enhancing Machine Learning Automation, 5/1/23, TikTok Instagram

Automating machine learning with Large Language Models (LLMs). While it's possible to ask ChatGPT to provide code for building a prediction model MLCopilot goes way beyond that. It uses a memory and knowledge bank of past experiences solving ML tasks. #datascience #machinelearning #automl #mlcopilot #largelanguagemodels #llms MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks: https://arxiv.org/abs/2304.14979 Background image by Michael Dziedzic: https://unsplash.com/photos/aQYgUYwnCsM

Automating machine learning with Large Language Models (LLMs). While it's possible to ask ChatGPT to...

DePlot Transforms Plots for LLM Queries, 5/3/23, TikTok Instagram

DePlot translates plots into readable tables that an LLM can query. It's based on the MatCha architecture with more fine-tuning on plots. Nice example of visual language reasoning. #datascience #machinelearning #deplot #matcha #documentai #visualreasoning #multimodal Demo at: https://huggingface.co/spaces/fl399/deplot_plus_llm Paper at: https://arxiv.org/pdf/2212.10505.pdf Background from Maria Krasnova: https://unsplash.com/photos/qD7tpy_VozY

DePlot translates plots into readable tables that an LLM can query. It's based on the MatCha archite...

Prompt Injection: Major LLM Security Concern, 5/3/23, TikTok Instagram

Prompt injection attacks are a major security concern when using large language models (LLMs) like ChatGPT. They allow attackers to overwrite the developers intentions. Right now there aren't 100% effective methods for stopping this attack. Prompt injection explained: https://simonwillison.net/2023/May/2/prompt-injection-explained/ Background image by Tim Mossholder: https://unsplash.com/photos/WZepC_pvKKg

Prompt injection attacks are a major security concern when using large language models (LLMs) like C...

Building AI Question/Answer Application Guide, 5/4/23, TikTok Instagram

Building a question / answer application using a large language model is a great starter project. You will need to use a vector database and prompting an LLM. It's a great way to start a journey into practical applications of generative AI. #datascience #machinelearning #questionanswer #generativeAI #largelanguagemodels #vectordatabase Knowledge Retrieval Architecture for LLM‚Äôs (2023): https://mattboegner.com/knowledge-retrieval-architecture-for-llms/ Deepset: https://haystack.deepset.ai/tutorials PineCone: https://docs.pinecone.io/docs/examples LangChain: https://python.langchain.com/en/latest/use_cases/question_answering.html LLama-Index: https://gpt-index.readthedocs.io/en/stable/ Background image: https://unsplash.com/photos/MfBnqUOz_qY

Building a question / answer application using a large language model is a great starter project. Yo...

Balancing Hallucinations Risks in Large Language Models, 5/6/23, TikTok Instagram

Hallucinations from large language models are a concern. However balance them against the effectiveness of these models and the risks of using such a model. Too many people are running scared of Hallucinations. (And we could use more people on ML Twitter retiring). #datascience #machinelearning #largelanguagemodels #hallucinations #practicalai

Hallucinations from large language models are a concern. However balance them against the effectiven...

GPT-4 Excels in Casual Reasoning, 5/7/23, TikTok Instagram

GPT-4 showing amazing results in casual reasoning. For practical purposes experiments are more useful than causal modeling. However this paper shows the complexity of GPT-4. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality: https://arxiv.org/abs/2305.00050 Background by George Dagerotip: https://unsplash.com/photos/ScITlIhACwo

GPT-4 showing amazing results in casual reasoning. For practical purposes experiments are more usefu...

Scale Hiring for AI Collaboration Roles, 5/9/23, TikTok Instagram

Labeling companies like Scale are hiring people to build and improve models based on these skills. By next year people in these fields should get ready to work alongside AI. Hat tip to @Rachel‚ÄÇ|‚ÄÇThe‚ÄÇAI‚ÄÇExchange‚ÄÇü§ñ on Scale hiring in these fields #datascience #machinelearning #chatgpt #largelanguagemodels #trainingml Next Generation LLM skills: Personal Training Accounting & Tax Biology Business & Industry Chemistry Computer Science Data Science & Programming CS Math Biology Chemistry and Physics Economics Finance Health Coach Historians Human Resources K12 Teachers Lawyers Marketing Mathematics Nutritionists Physics Poetry Writing Programmers Self Help Sports Journalist Travel & Transportation Writing

Labeling companies like Scale are hiring people to build and improve models based on these skills. B...

ImageBind: Multimodal Unsupervised Data Binding AI, 5/11/23, TikTok Instagram

ImageBind the first AI model capable of binding data from six modalities at once without the need for explicit supervision. It recognizes the relationships between these modalities ‚Äî images and video audio text depth thermal and inertial measurement units (IMUs). #datascience #machinelearning #embeddings #imagebind #multimodal Imagebind: https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/ ImageBind: One Embedding Space To Bind Them All - https://arxiv.org/abs/2305.05665 Code: https://github.com/facebookresearch/ImageBind Taylor Swift embeddings by Krystal Kirkland: https://www.linkedin.com/feed/update/urn:li:activity:7062203006427013120 Background image by Alina Grubnyak: https://unsplash.com/photos/ZiQkhI7417A

ImageBind the first AI model capable of binding data from six modalities at once without the need fo...

AI Developments and Challenges: May 2023, 5/12/23, TikTok Instagram

Jokes explained - news in mid-May 2023 Google introduced Bard2 which performs on par with GPT3.5 and Claude from Anthropic. Google also announced it is starting to train Gemni a GPT-5 competitor. The US White House summoned AI leaders to lecture them on privacy and other concerns. Anthropic published another paper in its typical moralizing tone. NVIDIA keeps racking up big gains making huge revenue off AI. The H100 is their latest state-of-the-art GPU and they can't make enough. Google and Amazon both have independent chips sources from NVIDIA (but AWS still uses lots of NVIDIA). We now have three publicly available models that are better than the average human at a wide range of tasks Sam Altman is starting to invest in fusion and Microsoft is the first customer. Cohere is struggling with staying top tier - it can‚Äôt compete with GPT4 and is instead focusing on other models like it‚Äôs new reranker. Cohere is based in Toronto. Together.xyz and Mosaic have open sourced smaller LLMs that can be commercially used. Hinton is leaving AI in an oh-so-dramatic manner. Like he wasn‚Äôt fully responsible and profiting off AI his entire life. Microsoft is doing well on the enterprise market selling million-dollar deals with enterprises for GPT3/4. Amazon has nothing but the sound of customers trying to escape. It announced it‚Äôs Bedrock but right now Bedrock looks more like a swamp of poor contenders. A PIP is a performance plan that Amazon is notorious for using for poor performers. Amazon is well known for having pretty poor working conditions. IBM is jumping into AI by returning to the brand of it‚Äôs previous disaster Watson. Right now Amazon‚Äôs Bedrock is a more viable product than WatsonxAI. Scary for anyone that cares about IBM. #datascience #machinelearning #openai #google #cohere #ibm #anthropic #microsoft #nvidia #amazon

Jokes explained - news in mid-May 2023 Google introduced Bard2 which performs on par with GPT3.5 and...

LangChain Introduces Advanced Agent 'Plan and Execute', 5/14/23, TikTok Instagram

LangChain added a new agent Plan and Execute. Looking forward to the more advanced use cases people will build with it. This was inspired by BabyAGI and the "Plan and Solve" paper. #datascience #machinelearning #largelanguagemodels #langchain Lang Chain Agent: https://python.langchain.com/en/latest/modules/agents/plan_and_execute.html Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models: https://arxiv.org/pdf/2305.04091.pdf Background by charlesdeluvio: https://unsplash.com/photos/OWkXt1ikC5g

LangChain added a new agent Plan and Execute. Looking forward to the more advanced use cases people ...

Meta AI Sharing Top-notch Models, 5/14/23, TikTok

Replying to @Sam This video won't be popular but I have to speak the truth. Meta AI has been really sharing out top notch open source models all across AI. #machinelearning #datascience #metaai #google Background by ThisisEngineering RAEng: https://unsplash.com/photos/bcqDxjddPGk

Replying to @Sam This video won't be popular but I have to speak the truth. Meta AI has been really ...

Enhancing Neural Networks with Number Size Techniques, 5/16/23, TikTok Instagram

Thinking about the size of numbers becomes important when working with neural networks. This video touches about different techniques like using bfloat16 and quantization. #datascience #machinelearning #bfloat16 #quantization #largelanguagemodels Links: Accelerating Large Language Models with Mixed-Precision Techniques: https://lightning.ai/pages/community/tutorial/accelerating-large-language-models-with-mixed-precision-techniques/ BFloat16: The secret to high performance on Cloud TPUs: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus Llama.cpp: https://github.com/ggerganov/llama.cpp/ A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers Accelerate and bitsandbytes: https://huggingface.co/blog/hf-bitsandbytes-integration Background by Umberto: https://unsplash.com/photos/jXd2FSvcRr8

Thinking about the size of numbers becomes important when working with neural networks. This video t...

Active Learning Optimizes Data Labeling Process, 5/18/23, TikTok Instagram

Active learning uses an algorithm to help select what data to label. Ideally using this approach people can get comparable model results using less labeled data. #datascience #machinelearning #activelearning #datalabeling Active Learning Strategies from Neptune.ai: https://neptune.ai/blog/active-learning-strategies-tools-use-cases

Active learning uses an algorithm to help select what data to label. Ideally using this approach peo...

OpenAI Answers Key Questions: Interview, 5/19/23, TikTok Instagram

Exclusive interview with openAI asking all the questions you wished the ask. Including: What's the deal with the name? How do you feel about Open Source? Is your goal to be a 100 billion dollar company? If your moat eroding? #datascience #machinelearning #openai

Exclusive interview with openAI asking all the questions you wished the ask. Including: What's the d...

Mitigating Bias in Generative AI, 5/19/23, TikTok Instagram

Bias in Generative AI. This post is based on a blog post by text.io on bias in generative AI using an example of job postings. A great reminder that it's very easy for generative models to introduce bias and problematic outputs. #datascience #machinelearning #bias #generativeai #openai Textio blog post: https://textio.com/blog/mindful-ai-crafting-prompts-to-mitigate-the-bias-in-generative-ai/115959775665 Background by Manuel: https://unsplash.com/photos/CANL3bzp6wU

Bias in Generative AI. This post is based on a blog post by text.io on bias in generative AI using a...

GPT-4 Overhauls Human Data Annotation, 5/20/23, TikTok Instagram

An emerging trend of using large language models like GPT-4 for labeling data instead of using humans to annotate data: #datascience #machinelearning #gpt4 #alpaca #labelingdata #annotatingdata Background by Erol Ahmed: https://unsplash.com/photos/Y3KEBQlB1Zk ChatDoctor: https://github.com/Kent0n-Li/ChatDoctor GPT-4 Labeling: https://www.artisana.ai/articles/gpt-4-outperforms-elite-crowdworkers-saving-researchers-usd500-000-and-20

An emerging trend of using large language models like GPT-4 for labeling data instead of using human...

Andrew Ng Discusses No-Test-Set Approach, 5/22/23, TikTok Instagram

Andrew Ng wrote recently on this no test set approach that he is seeing when people are using prompt engineering. This is very different than traditional machine learning approaches that rely on a test set. The video reviews some of the tradeoffs around this approach. #datascience #machinelearning #promptengineering #validation Andrew Ng Batch: https://www.deeplearning.ai/the-batch/issue-197/

Andrew Ng wrote recently on this no test set approach that he is seeing when people are using prompt...

Eric Hartford Unveils Uncensored WizardLM Models, 5/25/23, TikTok Instagram

Uncensored models are here. Eric Hartford has been building the WizardLM series of models and sharing how he has been training the models. These models remove a lot of insttructions that are perceived to carry certain values. Once consequence is models that are less aligned may actually perform better. #datascience #machinelearning #wizardlm #uncensoredmodels Uncensored Models: https://erichartford.com/uncensored-models WizardLM: https://huggingface.co/ehartford/WizardLM-7B-Uncensored Vicuna Unfiltered: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered Sparks of AGI: https://www.youtube.com/watch?v=qbIk7-JPB2c Background by Jean Carlo Emer: https://unsplash.com/photos/5o1YssX5naM

Uncensored models are here. Eric Hartford has been building the WizardLM series of models and sharin...

QLoRA Enables Efficient 4-Bit Finetuning, 5/26/23, TikTok Instagram

QLoRA allows for an efficient finetuning approach that supports using a 4-bit approach. This allows people to fine models using a single GPU. It's possible to now fine tune a 33B parameter model in less than 24 GB. #datascience #machinelearning #lora #peft #qlora #finetuning #largelanguagemodels Paper: https://arxiv.org/abs/2305.14314 Code+Demo: https://github.com/artidoro/qlora Samples: https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing Colab: https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing Background by Vishnu Mohanan: https://unsplash.com/collections/1779288/lb---brain-dump

QLoRA allows for an efficient finetuning approach that supports using a 4-bit approach. This allows ...

Reality of AI Risks Beyond Hype, 5/27/23, TikTok Instagram

Deepmind and OpenAI want everyone to focus on extreme risks of AI. This helps them hype up AI and make themselves more attractive. The reality is there are far greater and more mundance risks that are occuring today. Let's talk about the data these models are trained on the biases in these models how the models are being used and the social and economic implications of these models. #datascience #machinelearning #modelbias #modelrisk Model evaluation for extreme risks: https://arxiv.org/pdf/2305.15324.pdf Github Copilot Litigation: https://githubcopilotlitigation.com/ Stable Diffusion Lawsuit: https://stablediffusionlitigation.com/

Deepmind and OpenAI want everyone to focus on extreme risks of AI. This helps them hype up AI and ma...

GPUs Essential for Deep Learning Power, 5/30/23, TikTok Instagram

GPUs power a lot of deep learning and large language models. A key is the use of linear algebra like matrix multiplication that can be parallelized across all the cores in a GPU. #datascience #machinelearning #deeplearning #nvidia #matrixmultiplication Pie example from: https://www.mathsisfun.com/algebra/matrix-multiplying.html Bruna Branco background: https://unsplash.com/photos/FWaV69D5b8k

GPUs power a lot of deep learning and large language models. A key is the use of linear algebra like...

Countries Permit Copyrighted Material for AI Training, 6/1/23, TikTok Instagram

Japan said it was acceptable to use copyrighted material such as text and images to train AI. This has the approach of United States and other countries like Israel have also followed the US. All of this makes it much easier for people to train AI models within these countries. #datascience #machinelearning #copyright #fairuse Israel: https://www.project-disco.org/intellectual-property/011823-israel-ministry-of-justice-issues-opinion-supporting-the-use-of-copyrighted-works-for-machine-learning/ Japan: https://technomancers.ai/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/ Background by Dario Seretin: https://unsplash.com/photos/AGgOAqqGlT4

Japan said it was acceptable to use copyrighted material such as text and images to train AI. This h...

Choosing Between Large and Small Models, 6/2/23, TikTok Instagram

Deciding whether to use a Large Language Model or a smaller model? This video explores the tradeoffs between both approaches based on the latest research (May 2023) on the performance of these models. The video covers the effectiveness of LLMs where smaller models best LLMs and criteria for deciding between the two. #machinelearning #datascience #largelanguagemodels

Deciding whether to use a Large Language Model or a smaller model? This video explores the tradeoffs...

Challenges with Open-source Large Language Models, 6/3/23, TikTok Instagram

Open source LLMs why they seem popular are not easy to get running in production settings. The current open source LLMs while getting better still lag behind the commercial APIs in many areas. This video highlights a few of them. #datascience #machinelearning #largelanguagemodels #openai #anthropic #flant5 MMLU Leaderboard: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu The False Promise of Imitating Proprietary LLMs: https://arxiv.org/abs/2305.15717 Background by Qiming Chen: https://unsplash.com/photos/lzCH2_8qRH8

Open source LLMs why they seem popular are not easy to get running in production settings. The curre...

Building Custom Domain Large Language Model, 6/4/23, TikTok

Let's dig into the detail for building your own large language model on a custom domain. The LLaVA-Med does a great breakdown of how they built their model. The video goes through their data preparation training and evaluation of the model. #datascience #machinelearning #largelanguagemodel #vicuna #llava-med LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day: https://arxiv.org/pdf/2306.00890.pdf Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis: https://arxiv.org/pdf/2305.13230.pdf Annotated graph by Sebastian Raschka Background by R O: https://unsplash.com/photos/FFA8yd4OynY

Let's dig into the detail for building your own large language model on a custom domain. The LLaVA-M...

Optimizing GPU Memory for Transformers, 6/7/23, TikTok Instagram

Making efficient use of GPU Memory when training transformer models. This video covers the Kernel Overhead Optimizer states Activation memory and Gradient memory. #machinelearning #transformers #datascience #deeplearning #nvidia #huggingface Efficient Training on a Single GPU: https://huggingface.co/docs/transformers/perf_train_gpu_one

Making efficient use of GPU Memory when training transformer models. This video covers the Kernel Ov...

Exploring Large Language Models' Data Pipelines, 6/8/23, TikTok

So what's inside those large language models? This video explains the data pipeline for high-quality training data used in the latest LLMs like Falcon and LLaMa. In the case of RefinedWeb the pipeline ends up with 12% of the original data. #datascience #machinelearning #largelanguagemodels #commoncrawl Appreciating the complexity of large language models data pipelines: https://blog.christianperone.com/2023/06/appreciating-llms-data-pipelines/ The RefinedWeb Dataset for Falcon LLM: https://arxiv.org/pdf/2306.01116.pdf RedPajama Dataset: https://github.com/togethercomputer/RedPajama-Data Common Crawl: https://commoncrawl.org/ Background by Emil Widlund: https://unsplash.com/@emilwidlund

So what's inside those large language models? This video explains the data pipeline for high-quality...

Ranking Open-Source LLMs: Various Methodologies, 6/10/23, TikTok Instagram

With the growth of open-source LLMs many leaderboards to rank these models are emerging. Several different methodologies are used including human evaluation academic datasets and evaluation using GPT-4. These are all great but also remember to use methods that align with your use cases. #datascience #largelanguagemodels #machinelearning #leaderboardsgun

With the growth of open-source LLMs many leaderboards to rank these models are emerging. Several dif...

Evaluating Generative Models: An Overview, 6/24/23, TikTok Instagram

Evaluating generative models means considering many factors including prompts tokenization and evaluating generated results. This video should give you an intuition of the different ways to evaluate models. This is inspired by the recent Hugging Face blog post on MMLU evaluation. #datascience #machinelearning #largelanguagemodels #modelevaluation #MMLU #leaderboards What's going on with the Open LLM Leaderboard? https://huggingface.co/blog/evaluating-mmlu-leaderboard

Evaluating generative models means considering many factors including prompts tokenization and evalu...

Understanding AI: Growth and Impact, 6/25/23, TikTok Instagram

A simple explanation of what AI is. The video touches upon the impact of AI how AI works with a practical example and some of the reasons AI has grown so much in the last ten years. #datascience #machinelearning #ai #aiexplained

A simple explanation of what AI is. The video touches upon the impact of AI how AI works with a prac...

No 'Best' Algorithm in Data Science, 6/26/23, TikTok

Replying to @Davos What's the best algorithm? ü§î There is no best algorithm! This is an excellent reminder of the free lunch theorem; no algorithm is always the best in data science. If you need empirical proof go check out Kaggle competitions where you will see a variety of winnings algorithms. In this video I highlight "#1 solution - generalization with linear regression‚Äù as the winning solution for the Kaggle competition - GoDaddy - Microbusiness Density Forecasting. This solution beat over 3000 other teams using a linear model! https://www.kaggle.com/competitions/godaddy-microbusiness-density-forecasting/discussion/395131 #datascience #machinelearning #algorithms #nofreelunch

Replying to @Davos What's the best algorithm? ü§î There is no best algorithm! This is an excellent ...

Omar Uncovers Middle-East Favored Bias in Falcon, 6/27/23, TikTok Instagram

This video is based on work Omar did in tracking down why Falcon was giving results that favored the Middle East. It's an example of how bias can exist in many different places when using models. #datascience #machinelearning #largelanguagemodels #falcon #modelbias Original tweet by Jan Kulveit: https://twitter.com/jankulveit/status/1670735364707721216 Omar Tweet: https://twitter.com/osanseviero/status/1671210627837095942 Background by Michal Mancewicz: https://unsplash.com/photos/_wdOjxXPxUU

This video is based on work Omar did in tracking down why Falcon was giving results that favored the...

OpenAI LLM Mentioned in Meta Paper, 6/28/23, TikTok Instagram

The LocalLlama subreddit received a citation in a recent paper by Meta. Great reminder of the innovation you can get when models have a large community using them. Rumor is OpenAI will release a new open-source LLM this summer. #datascience #machinelearning #largelanguagemodels #localllama #meta #openai #gpt3 LocalLlama subreddit: https://www.reddit.com/r/LocalLLaMA/ Extending Context Window of Large Language Models via Positional Interpolation: https://arxiv.org/abs/2306.15595 Background by Ella de Kross: https://unsplash.com/s/photos/inside-barn

The LocalLlama subreddit received a citation in a recent paper by Meta. Great reminder of the innova...

Emerging Role: AI Engineer with LLMs, 6/30/23, TikTok Instagram

AI Engineer is starting to emerge as a new role. This role works with LLMs and does prompt engineering and fine tuning of models. They typically put together generative AI workflows. This role doesn't require a traditional data science background but should still pay well. #datascience #machinelearning #aiengineer AI Engineer: https://www.latent.space/p/ai-engineer

AI Engineer is starting to emerge as a new role. This role works with LLMs and does prompt engineeri...

Consider Resharing Tensorboard Projector Videos, 7/1/23, TikTok

#onthisday tensorboard embedding projector. Let me know if i should reshare these older videos.

Context Length Vital for Language Models, 7/2/23, TikTok Instagram

Context length has grown in importance for large language models. A longer context length lets you pass more information to the model effectively giving it a larger working memory. While technically it's easy to get a model to work on a longer context length that doesn't translate into good performance. It is also essential to train the model how to use that context length. #largelanguagemodels #aiengineer # #contextlength #openai #anthropic #longchat Background by Sven Mieke: https://unsplash.com/photos/MsCgmHuirDo How Long Can Open-Source LLMs Truly Promise on Context Length? https://lmsys.org/blog/2023-06-29-longchat/ LongEval: https://github.com/DachengLi1/LongChat

Context length has grown in importance for large language models. A longer context length lets you p...

WizMap Tutorial for Visualizing Embeddings, 7/3/23, TikTok

Replying to XYZ A quick tutorial using WizMap to visualize embeddings. The process is extracting your embeddings using dimensionality reduction to get them down to 2 or 3 dimensions and then plotting these 2 or 3 dimensions. Background by Jason Leung: https://unsplash.com/photos/UMncYEfO9-U WizMap: https://github.com/poloclub/wizmap Older videos: Instructor Embeddings: https://github.com/poloclub/wizmap Tensorboard Projector: https://www.tiktok.com/@rajistics/video/7250980121656446254?lang=en Latent Space: https://www.tiktok.com/@rajistics/video/7141786930118774058?lang=en Latent Space in Stable Diffusion: https://www.tiktok.com/@rajistics/video/7143359375409990958?lang=en

Replying to XYZ A quick tutorial using WizMap to visualize embeddings. The process is extracting you...

Older Video Predicts Crime #OnThisDay, 7/5/23, TikTok

One of my older videos Predicting crime

NSQL: Simplifying Text to SQL, 7/6/23, TikTok Instagram

Text to SQL is now easier with a large language model released by Numbers Station called NSQL. #largelanguagemodels #nsql #numberstation #machinelearning Introducing NSQL: Open-source SQL Copilot Foundation Models: https://www.numbersstation.ai/post/introducing-nsql-open-source-sql-copilot-foundation-models

Text to SQL is now easier with a large language model released by Numbers Station called NSQL. #larg...

OpenAI's New Deprecation Policy Impacts Users, 7/7/23, TikTok Instagram

OpenAI announced their new deprecation policy and it's going to affect people who are using OpenAI's models in production. They will have to test the new models retrain any fine-tuned models and recreate any embeddings. This is one big advantage of open-source models you can use them as long as you want and then decide when to upgrade. Thanks @gptboss for sharing his feedback on the policy #largelanguagemodels #machinelearning #openai #opensource OpenAI Blog post: https://openai.com/blog/gpt-4-api-general-availability Deprecation Summary: https://community.openai.com/t/openai-deprecation-summary/289539

OpenAI announced their new deprecation policy and it's going to affect people who are using OpenAI's...

GPT-4 & Mixture of Experts Approach, 7/8/23, TikTok Instagram

What makes GPT-4 so special? One big part is the use of a Mixture of Experts approach Let's start with how Galton used the wisdom of the crowds over a 100 years ago to get the weight of a cow accurately. Or more recently how the $1 million Netflix competition led to recognition of the power of ensembles in machine learning And finally recent research using a Mixture of Experts approach with Large Langage Models Tags: #largelanguagemodels #machinelearning #openai #mixtureofexperts #ensemblelearning #gpt4 Links: GPT-4: 8 Models in One; The Secret is Out - https://pub.towardsai.net/gpt-4-8-models-in-one-the-secret-is-out-e3d16fd1eee0 Mixture-of-Experts Explained: Why 8 smaller models are better than 1 gigantic one: https://alexandrabarr.beehiiv.com/p/mixture-of-experts Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models - https://arxiv.org/abs/2305.14705

What makes GPT-4 so special? One big part is the use of a Mixture of Experts approach Let's start wi...

Code Interpreter: Mixed Reviews from Experts, 7/12/23, TikTok Instagram

Code Interpreter is out and it's pretty amazing at first glance. However more experienced software developers and people concerned about data analysis might not be as impressed. This video offers a few different perspectives of the OpenAI enthusiast the crusty software developer and the skeptical data analysis instructor. #openai #codeinterpreter #dataanalysis #aiagents Code Interpreter release: https://venturebeat.com/ai/code-interpreter-comes-to-all-chatgpt-plus-users-anyone-can-be-a-data-analyst-now/ Aaron Brand: https://twitter.com/aron_brand/status/1677305077164285954 Arvind Narayanan: https://twitter.com/random_walker/status/1679109189551939584 Background video from Prompt Engineering: https://www.youtube.com/watch?v=2Ygm6fvR7yM&ab_channel=PromptEngineering

Code Interpreter is out and it's pretty amazing at first glance. However more experienced software d...

AI Object Detection with YOLOv6, 7/13/23, TikTok

Object detection for AI using yolov6 Check out the demo at: https://huggingface.co/spaces/rajistics/yolov6

Object detection for AI using yolov6 Check out the demo at: https://huggingface.co/spaces/rajistics/...

FTC Leader Khan Challenges Big Tech, 7/14/23, TikTok Instagram

The current FTC leader Khan is willing to confront large tech companies about uncompetitive practices. There is a long history of abuses by tech companies and expect them to limit the work of the FTC. #technologyregulation #ftc

The current FTC leader Khan is willing to confront large tech companies about uncompetitive practice...

AMD Chips Effectively Run Pytorch 2.0, 7/15/23, TikTok Instagram

AMD chips running Pytorch 2.0! Reviewing the work of MosaicML testing out AMD enterprise chips for training a large language model. The bottom line is that AMD running on Pytorch gives the AI community more options. #largelanguagemodels #machinelearning #gpus #amd #mosaicmscduet

AMD chips running Pytorch 2.0! Reviewing the work of MosaicML testing out AMD enterprise chips for t...

Google's Soundstorm: High-Quality Audio Generation, 7/17/23, TikTok Instagram

Soundstorm is a new audio generation model from Google. It can rapidly generate high-quality audio. Google isn't making this model available yet but in the meantime check out the Hugging Face Audio courses if you want to learn more about machine learning with audio. #datascience #machinelearning #soundstorm #google #audioml SoundStorm: Efficient Parallel Audio Generation - https://arxiv.org/pdf/2305.09636.pdf Hugging Face Audio Transformers Course: https://huggingface.co/learn/audio-course/chapter0/introduction

Soundstorm is a new audio generation model from Google. It can rapidly generate high-quality audio. ...

Llama 2: Improved Performative Chat Model, 7/18/23, TikTok Instagram

Llama 2 is a worthy successor to Meta's original LLaMa model. It performs better -- on par with ChatGPT has a commercial license and and is publicly available. There are many great things about the model and I recommend reading the paper it's pretty approachable. Paper: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/ Model Repo: https://huggingface.co/meta-llama Background video: https://www.youtube.com/watch?v=uAl4qWuuJiw&ab_channel=ThePetCollective

Llama 2 is a worthy successor to Meta's original LLaMa model. It performs better -- on par with Chat...

Critically Assess AI Developments: Three Examples, 7/20/23, TikTok Instagram

If an AI story looks too good approach it critically. This week there are three examples with GPT-4 gzip and Llama where AI influencers jumped on too quickly. Links to the pushback on these stories: GPT-4 performance: https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-time Gzip Leakage: https://github.com/bazingagin/npc_gzip/issues/13 Gzip Top 2: https://kenschutte.com/gzip-knn-paper/ Llama leakage: https://www.reddit.com/r/LocalLLaMA/comments/1548cuw/comment/jspgasv/?utm_source=share&utm_medium=web2x&context=3 #datascience #machinelearning #taylorswift #gpt4 #llama #gzip #targetleakage

If an AI story looks too good approach it critically. This week there are three examples with GPT-4 ...

Human and Environmental Impact of AI, 7/20/23, TikTok Instagram

With the writer's strike in the US this video reminds us of the human and environmental costs of building AI. Three critical components for building LLMs are data talent and hardware. #machinelearning #largelanguagemodels #openai #environment #ethicalai #WGAStrike

With the writer's strike in the US this video reminds us of the human and environmental costs of bui...

Deep Dive into Llama-2 Paper, 7/22/23, TikTok

Replying to @Rajiv Shah | data science & AI Llama-2 deep dive going through the paper by Meta. This is a 10-minute video but it still skips over many great parts of this paper. Go read the paper. #datascience #machinelearning #largelanguagemodels #llama2 https://arxiv.org/pdf/2307.09288.pdf

Replying to @Rajiv Shah | data science & AI Llama-2 deep dive going through the paper by Meta. This ...

Stackoverflow and Github Copilot, 7/25/23, TikTok

Stackoverflow and Github Copilot

Video coming on Text Generational in Colab, 7/25/23, TikTok

Video coming on Text Generational in Colab

Start Your Large Language Models Journey, 7/26/23, TikTok Instagram

üöÄ Just get started on your journey to learn large language models! ü§î Is there a lot to learn? Yes! üòÖ ü§∑‚Äç‚ôÇÔ∏è But is it easy to get started? Yes! üëç ‚úÖ Go do it!! üèÉ‚Äç‚ôÇÔ∏èüí® #datascience #machinelearning #largelanguagemodels #llama2

üöÄ Just get started on your journey to learn large language models! ü§î Is there a lot to learn? ...

Classic Debate: Notebooks Vs Scripts, 7/26/23, TikTok

#onthisday a classic debate notebooks versus scripts

Analyzing Bias in Humans and AI, 7/27/23, TikTok Instagram

Humans are Biased: Generative AI is Even Worse Check it out at: https://www.bloomberg.com/graphics/2023-generative-ai-bias This is also a great starter project for people to analyze biases of different things or actions you care about. #datascience #machinelearning #biasml #generativeai

Humans are Biased: Generative AI is Even Worse Check it out at: https://www.bloomberg.com/graphics/2...

Key Transformer Enhancements: Rotary Embeddings, Multi-Query, 7/29/23, TikTok Instagram

Three major improvements to the transformer architecture that everyone should know. They include Fast Attention Rotary Positional Embeddings and Multi-Query Attention. #machinelearning #largelanguagemodels #positionalencodings #flashattention #mulitqueryattention Useful Links: Learning position with Positional Encoding: https://www.scaler.com/topics/nlp/positional-encoding/ ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING: https://arxiv.org/pdf/2104.09864.pdf Rotary Embeddings: A Relative Revolution - https://blog.eleuther.ai/rotary-embeddings/ Flash Attention: https://github.com/Dao-AILab/flash-attention GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints: https://arxiv.org/abs/2305.13245

Three major improvements to the transformer architecture that everyone should know. They include Fas...

Deploying Large Language Models Efficiently, 7/30/23, TikTok Instagram

Some tips for deploying large language models like Llama. Start by building some benchmarks for your tasks to assess how your model performs on different GPUs. If it's too slow think about using a model size model using quantization and looking for an improved model serving solution. I mentioned a lot of packages but this is a fast-moving area. #largelanguagemodels #deployment #quantization Hamel's LLM inference analysis: https://hamel.dev/notes/llm/03_inference.html

Some tips for deploying large language models like Llama. Start by building some benchmarks for your...

Key Points for Effective Data Analysis, 8/2/23, TikTok Instagram

Some data analysis tips: 1. Your data might be unrepresentative 2. Think about what was collected and what wasn't 3. Not all data is useful #datascience #machinelearning #dataanalysis

Some data analysis tips: 1. Your data might be unrepresentative 2. Think about what was collected an...

Tradeoff: Accuracy vs Interpretability in ML, 8/8/23, TikTok Instagram

Accuracy versus Interpretability/Explainability is a typical tradeoff in machine learning. Depending on your use case you may favor one over the other. Understanding this tradeoff helps you make better decisions. Manipulating and Measuring Model Interpretability: https://arxiv.org/pdf/1802.07810.pdf

Accuracy versus Interpretability/Explainability is a typical tradeoff in machine learning. Depending...

Riveter: Python Package Analyzing Social Dynamics, 8/8/23, TikTok Instagram

Riveter üí™ is a Python package that measures social dynamics between personas mentioned in a collection of texts. Check it out at: https://github.com/maartensap/riveter-nlp Paper is here: http://maartensap.com/pdfs/antoniak2023riveter.pdf

Riveter üí™ is a Python package that measures social dynamics between personas mentioned in a colle...

Challenges of Using Large Language Models, 8/10/23, TikTok

Speed run - 8 minute video on 16 Challenges for using large language models (LLMs) 1. Unfathomable Datasets 2. Tokenizer-Reliance 3. High Pre-Training Costs 4. Fine-Tuning Overhead 5. High Inference Latenc 6. Limited Context Lengt 7. Prompt Brittlenes 8. Hallucination 9. Misaligned Behavior 10. Outdated Knowledge 11. Brittle Evaluation 12. Evaluations Based on Static Human-Written Ground Truth 13. Indistinguishability between Generated and Human-Written Text 14. Tasks Not Solvable By Scale 15. Lacking Experimental Designs 16. Lack of Reproducibility See the full paper: Challenges and Applications of Large Language Models - https://arxiv.org/pdf/2307.10169.pdf #machinelearning #largelanguagemodels

Speed run - 8 minute video on 16 Challenges for using large language models (LLMs) 1. Unfathomable D...

LLMs: Mimicking Plans, Not Planning, 8/12/23, TikTok Instagram

LLMs are approximate retrievers that are mimicking plans rather than truly planning. Great argument put forth by Subbarao Kambhampati who is skeptical of the LLM reasoning arguments check out his youtube for a longer discussion: #largelanguagemodels #machinelearning #gpt4 #aireasoning #aiplanning Avenging Polanyi's Revenge: Exploiting the Approximate Omniscience of LLMs in Planning without Deluding Yourself In the Process: https://youtu.be/BmyB-4S9QuY Full Abstract: LLMs are on track to reverse what seemed like an inexorable shift of AI from explicit to tacit knowledge tasks. Trained as they are on everything ever written on the web LLMs exhibit "approximate omniscience"--they can provide answers to all sorts of queries with nary a guarantee. This could herald a new era for knowledge-based AI systems--with LLMs taking the role of (blowhard?) experts. But first we have to stop confusing the impressive form of the generated knowledge for correct content and resist the temptation to ascribe reasoning powers to approximate retrieval by these n-gram models on steroids. We have to focus instead on LLM-Modulo techniques that complement the unfettered idea generation of LLMs with careful vetting by model-based AI systems. In this talk I will reify this vision and attendant caveats in the context of the role of LLMs in planning tasks‚Ä¶ Longer Tutorial: On the Role of Large Language Models in Planning (Tutorial Part 1) - https://youtu.be/wgVZvXDvry0

LLMs are approximate retrievers that are mimicking plans rather than truly planning. Great argument ...

GPT4 Planning Abilities in Simulation Worlds, 8/13/23, TikTok Instagram

An experiment studying how well GPT4 can plan by using Block World and Mystery World. #largelanguagemodels #gpt4 #aiplanning #blockworld #mysteryworld On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark) - https://arxiv.org/abs/2302.06706 Lego Background: https://www.youtube.com/watch?v=2cOeUEjx-WI&ab_channel=BrickBuilder

An experiment studying how well GPT4 can plan by using Block World and Mystery World. #largelanguage...

Charlie's Comprehensive Guide to Llama 2, 8/15/23, TikTok Instagram

Some great tips from Charlie over at Replicate on using Llama 2. A guide to prompting Llama 2 - https://replicate.com/blog/how-to-prompt-llama Get started lots of places to use llama2: HuggingChat - https://huggingface.co/chat/ Preplexity AI - https://labs.perplexity.ai/ Replicate - https://replicate.com/replicate/llama-2-70b-chat LLama video: https://www.youtube.com/shorts/-MFFBA8bdd8

Some great tips from Charlie over at Replicate on using Llama 2. A guide to prompting Llama 2 - http...

Tradeoffs in Deploying Language Models, 8/15/23, TikTok Instagram

Latency is a key factor but there are others when thinking about deploying large language models. Let's discuss tradeoffs between latency throughput accuracy and cost. Latency Numbers Every Programmer Should Know - https://gist.github.com/jboner/2841832 Response Time 3 Limits - https://www.nngroup.com/articles/response-times-3-important-limits/ Background by Valentin Petkov: https://unsplash.com/photos/z06oDT-8pKQ

Latency is a key factor but there are others when thinking about deploying large language models. Le...

Intersection of Copyright and Language Models, 8/19/23, TikTok

Let's talk about how copyright intersects large language models around training LLMs outputs of LLMs and watermarking mechanisms. #datascience #machinelearning #largelanguagemodels #copyright A lot of this material is derived from a brilliant talk by leading copyright scholar Pamela Samuelson Large Language Models Meet Copyright Law https://www.youtube.com/watch?v=MFKV48ikV5E&ab_channel=SimonsInstitute

Let's talk about how copyright intersects large language models around training LLMs outputs of LLMs...

NanoGPT: Fast Tool for GPT Training, 8/20/23, TikTok Instagram

NanoGPT is a simple fast repository for training/finetuning medium-sized GPTs. I recommend it for getting a deeper understanding of large language models. #datascience #machinelearning #largelanguagemodels #nanogpt NanoGPT: https://github.com/karpathy/nanoGPT NanoGPT_Simpsons: https://github.com/rajshah4/nanoGPT_simpsons

NanoGPT is a simple fast repository for training/finetuning medium-sized GPTs. I recommend it for ge...

Revisiting Kmeans Illustration from Last Year, 8/22/23, TikTok

#onthisday reposting an older video from last year that illustrates kmeans

Challenges in Creating Useful AI Agents, 8/23/23, TikTok Instagram

The reality of AI Agents from Embra. While everyone hypes up Agents its a lot harder to make useful products based on Agents. #machinelearning #datascience #autogpt #embra #aiagents Original tweet by Zach of Embra: https://twitter.com/zachtratar/status/1694024240880861571 Agent Demo by Yohei: https://twitter.com/yoheinakajima/status/1640068466974633987?s=20

The reality of AI Agents from Embra. While everyone hypes up Agents its a lot harder to make useful ...

Hugging Face Valuation Reaches $4.5B, 8/24/23, TikTok Instagram

Hugging Face announced a new valuation of $4.5 billion! #datascience #machinelearning #huggingface

Symbolic Regression: Mathematical Data Representation, 8/25/23, TikTok Instagram

Symbolic regression focuses on a mathematical representation of your data. It's helpful in many situations where you need an explainable model or trying to model something where a mathematical formula represents the data well. SRBench: https://cavalab.org/srbench/ Intro to Eureqa: https://www.youtube.com/watch?v=NhC1Qb-PQ5Q&ab_channel=Eureqa Background Matheus Frade: https://unsplash.com/photos/iSSO7Fj1F98 #machinelearning #datascience #eureqa #symbolicregression

Symbolic regression focuses on a mathematical representation of your data. It's helpful in many situ...

Weekly AI News: Machine Learning, Data Science, 8/26/23, TikTok

AI news roundup for the week #machinelearning #datascience #rajistics

Using NanoGPT for Simpson Dataset Training, 8/27/23, TikTok

Replying to @Rajiv Shah | data science & AI NanoGPT is a simple fast repository for training/finetuning medium-sized GPTs. I recommend it to get a better handle on large language models. This video walks through using it on a Simpsons dataset. It covers why I chose nanoGPT how I munged the Simpson dataset how I trained my first model and ways to keep learning. #datascience #machinelearning #rajistics #nanoGPT #simpsons NanoGPT: https://github.com/karpathy/nanoGPT NanoGPT_Simpsons: https://github.com/rajshah4/nanoGPT_simpsons Longer YT video: https://youtu.be/Ty2_bR1mrBQ

Replying to @Rajiv Shah | data science & AI NanoGPT is a simple fast repository for training/finetun...

Examining Large Language Models' Evaluation, 8/28/23, TikTok Instagram

Evaluation of Large Language Models is a critical topic. Leaderboards provide little guidance for evaluation but have many flaws. I am very focused on this topic this fall. I will speak on this topic at ODSC (SF) and Generative AI Summit (Austin) in October. If you have thoughts please reach out. #largelanguagemodels #evaluatingllms #rajistics Hot take came on a very hype article from semianalysis: https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini Background by Marek Piwnicki: https://unsplash.com/photos/a-large-body-of-water-with-mountains-in-the-background-Z6RT0qH1Oec

Evaluation of Large Language Models is a critical topic. Leaderboards provide little guidance for ev...

Reinforcement Learning in Eat Melon Demo, 8/30/23, TikTok Instagram

Reinforcement learning with my Eat Melon! Demo This demo is based on Karpathy's work. Link: https://bit.ly/raj_eatmelon #datascience #reinforcementlearning #techtok #machinelearning #rajistics #chatgpt #rlhf

Reinforcement learning with my Eat Melon! Demo This demo is based on Karpathy's work. Link: https://...

COLLIDE Data Conference in Atlanta, 8/30/23, TikTok

This group is holding COLLIDE Data Conference on October 3-4 at Center Stage Theater üé∏üé≠ in the heart of midtown Atlanta Georgia. Register now with promo code "INFLUENCER60" and take 60% OFF! #rajistics #datascience

This group is holding COLLIDE Data Conference on October 3-4 at Center Stage Theater üé∏üé≠ in the...

Human Supervision Vital in AI, 9/1/23, TikTok Instagram

Human in the loop is important but it's not a silver bullet. #aiethics #tesla #cigna #rajistics Cigna: https://www.healthcaredive.com/news/cigna-lawsuit-algorithm-claims-denials-california/688857/ Tesla: https://www.caranddriver.com/news/a44185487/report-tesla-autopilot-crashes-since-2019/

Human in the loop is important but it's not a silver bullet. #aiethics #tesla #cigna #rajistics Cign...

Avoid Toxicity in Data Science Teams, 9/4/23, TikTok

Data science is often team work and you want to try to avoid toxic teams. #datascience #rajistics #teamwork #kaggle

Data science is often team work and you want to try to avoid toxic teams. #datascience #rajistics #t...

Running AI Models in Browsers, 9/5/23, TikTok

Running large language models and transformer models locally in web browsers. Lot's of tools for doing this including mlc.ai transformers.js and webgpu. #largelanguagemodels #rajistics #mlcai #transformersjs #webgpu #browswerai Backgrond by Andrea De Santis: https://unsplash.com/photos/a-large-group-of-red-and-white-masks-EMvUZHjbxtI

Running large language models and transformer models locally in web browsers. Lot's of tools for doi...

RLAIF: New Approach Surpasses RLHF, 9/7/23, TikTok Instagram

Reinforcement Learning with AI Feedback (RLAIF) is an emerging approach to replace Reinforcement Learning with Human Feedback (RLHF). It works well according to the latest paper from Google. RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback: https://arxiv.org/abs/2309.00267 #largelanguagemodels #chatgpt #rlhf #rlaif #reinforcementlearning #rajistics

Reinforcement Learning with AI Feedback (RLAIF) is an emerging approach to replace Reinforcement Lea...

Excellent Analogy with Untapped Material, 9/9/23, TikTok

Its uncomfortable how good this analogy is. I have so much more material i left out. Enjoy this dont know how long it will stay up. #largelanguagemodels #rajistics #chatgpt #politics

Its uncomfortable how good this analogy is. I have so much more material i left out. Enjoy this dont...

Visualizing Stable Diffusion Latent Space, 9/10/23, TikTok

#onthisday Showing the latent space for stable diffusion. #stablediffusion #datascience #machinelearning #codetok #umap√™pravoc√™

#onthisday Showing the latent space for stable diffusion. #stablediffusion #datascience #machinelear...

YOLO's Creator Joseph Redmon Retires, 9/13/23, TikTok

YOLO is a seminal model in object detection for computer vision. But what is even more interesting is the principal author Joseph Redmon and his journey. While YOLO is still actively developed he has stopped participating. You Only Look Once: Unified Real-Time Object Detection (2015): https://arxiv.org/abs/1506.02640 YOLOv3: An Incremental Improvement: https://arxiv.org/abs/1804.02767 YOLO web site: https://pjreddie.com/darknet/yolo/

YOLO is a seminal model in object detection for computer vision. But what is even more interesting i...

Stable Diffusion Map Displayed Today, 9/15/23, TikTok

#onthisday showing the map of stable diffusion. #datascience #machinelearning #stablediffusion #rajistics

#onthisday showing the map of stable diffusion. #datascience #machinelearning #stablediffusion #raji...

Weekly AI News: Key Highlights, 9/16/23, TikTok

Your Ai News this week. #google #openai #meta #apple #tesla #rajistics

Harvard Study: GPT-4 Boosts Office Productivity, 9/18/23, TikTok Instagram

Breaking study from Harvard showing the impact of Large Language Models like GPT-4 on office productivity. #datascience #gpt4 #officeproductivity #chatgpt #rajistics Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321

Breaking study from Harvard showing the impact of Large Language Models like GPT-4 on office product...

NVIDIA GPUs: Accelerating AI Evolution, 9/19/23, TikTok Instagram

GPUs driven by NVIDIA are the key to today's AI. Without this compute we would not have the models like GPT-4. Let's review why GPU performance has grown so fast over the last ten years. #machinelearning #nvidia #deeplearning #rajistics 1000X: https://www.nextplatform.com/2023/03/31/a-peek-into-the-future-of-ai-inference-at-nvidia/ NVIDIA docs: https://oc.acm.org/docs/DL_HW_OC_ACM_0322.pdf Background by JuliusH: https://pixabay.com/videos/space-universe-spaceship-starry-sky-44690/

GPUs driven by NVIDIA are the key to today's AI. Without this compute we would not have the models l...

Basic Conformal Prediction for Intervals, 9/21/23, TikTok

Older video: Getting prediction intervals with conformal prediction. This is a very simple intro it can do much more. #datascience #statistics #predictioninterval #conformalprediction #rajistics

Older video: Getting prediction intervals with conformal prediction. This is a very simple intro it ...

Phi-CTNL Achieves State-of-the-Art Results Through Leakage, 9/25/23, TikTok Instagram

State-of-the-art results (100%!!) on widely used academic benchmarks (MMLU GSM8K HumanEval OpenbookQA ARC Challenge etc.). The model called phi-CTNL was trained on the evaluate datasets. Yea the performance is all due to leakage and this model is a parody. #machinelearning #datascience #rajistics #phiCTNL Pretraining on the Test Set Is All You Need: https://arxiv.org/pdf/2309.08632.pdf Background video by Frankkemperrupp: https://pixabay.com/videos/tube-burst-pipe-water-plumber-99693/

State-of-the-art results (100%!!) on widely used academic benchmarks (MMLU GSM8K HumanEval OpenbookQ...

ADbench: Introduction to Anomaly Detection, 9/26/23, TikTok Instagram

Anomaly detection is hard. This is an introduction to anomaly detection algorithms. The video focuses on the results for ADBench and what data scientists should now do. ADBench: Anomaly Detection Benchmark - https://arxiv.org/abs/2206.09426 #datascience #analytics #codetok #anomalydetection @rajistics

Anomaly detection is hard. This is an introduction to anomaly detection algorithms. The video focuse...

Exploring Non-Productivity Uses of AI, 9/28/23, TikTok Instagram

AI for other than productivity. Let's talk about how people are really using AI. #datascience #machinelearning #rajistics #therapy Lilian Weng post: https://twitter.com/lilianweng/status/1706544602906530000 LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset: https://arxiv.org/pdf/2309.11998.pdf

AI for other than productivity. Let's talk about how people are really using AI. #datascience #machi...

ChatGPT Enhances Routine Data Science Tasks, 9/29/23, TikTok Instagram

ChatGPT with the Code Interpreter can do a lot of common data science tasks. We are going to see more tools help with routine data science tasks. #datascience #machinelearning #chatgpt #codeinterpreter #rajistics What Should Data Science Education Do with Large Language Models? https://arxiv.org/pdf/2307.02792v2.pdf

ChatGPT with the Code Interpreter can do a lot of common data science tasks. We are going to see mor...

Starcoder Leads in Open-source Code Assistants, 9/30/23, TikTok Instagram LinkedIn

There are lots of open-source code assistant tools. Starcoder is the best known but many people are training and fine-tuning their own models. #machinelearning #bigcode #starcoder #copilot #textgenerationinference #sqlcoder #tabby #refact Big Code Leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard SQLCoder: https://github.com/defog-ai/sqlcoder StarCoder: https://huggingface.co/bigcode/starcoder Text Generation Inference: https://github.com/huggingface/text-generation-inference Other Products: https://refact.ai/ https://tabby.tabbyml.com/

There are lots of open-source code assistant tools. Starcoder is the best known but many people are ...

Meta Leads in Respecting Data Scientists, 10/5/23, TikTok Instagram

Three new multimodal models this week but only one respects data scientists. Once again it's Meta doing it right. #machinelearning #multimodal #rekaai #openai #meta #rajistics Reka: https://reka.ai/announcing-our-multimodal-ai-assistant/ OpenAI: https://cdn.openai.com/papers/GPTV_System_Card.pdf Microsoft: The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) - https://arxiv.org/pdf/2309.17421.pdf Meta: AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model - https://arxiv.org/pdf/2309.16058.pdf

Three new multimodal models this week but only one respects data scientists. Once again it's Meta do...

Unlearning Process Discovered for LLMs, 10/7/23, TikTok Instagram

Obliviate is now possible for LLMs. Microsoft researchers share an approach to get Large Language Models to unlearn information. #harrypotter #largelanguagemodels #machinelearning #rajistics Who‚Äôs Harry Potter? Approximate Unlearning in LLMs: https://browse.arxiv.org/pdf/2310.02238.pdf Background image from jazzmeister: https://pixabay.com/photos/cathedral-cloisters-harry-potter-2286910/

Obliviate is now possible for LLMs. Microsoft researchers share an approach to get Large Language Mo...

Baseline Models Essential for Time Series, 10/9/23, TikTok Instagram

Always have a baseline model. For time series you can often compare to what happened in a previous time step like last week. There are error metrics like MASE built on this idea. #datascience #codetok #statistics #timeseriesforcasting #timeseries

Always have a baseline model. For time series you can often compare to what happened in a previous t...

Gratitude Towards Barrnanas and AmplifyPartners, 10/10/23, TikTok

Thanks to barrnanas and AmplifyPartners

Essential Distance Metrics in Data Science, 10/15/23, TikTok Instagram

Getting the best distance metric is crucial for solving analytical problems. This video reviews Euclidean Manhattan Mahabolobis Levenshtein and cosine distance. There are many more and sometimes you have to create your own. #datascience #machinelearning #distancemetrics #rajistics

Getting the best distance metric is crucial for solving analytical problems. This video reviews Eucl...

LLMs Memorization with Starcoder Checker, 10/17/23, TikTok Instagram

How LLMs memorize information! Check out the Starcoder Memorization space by Mithril Security and the notebook so you can look for LLM memorization on your own (assuming you have access to the training data). #largelanguagemodels #security #mithrilsecurity #memorization #rajistics StarCoder Memorization Checker: https://huggingface.co/spaces/mithril-security/starcoder_memorization_checker Notebook: https://colab.research.google.com/drive/1YaaPOXzodEAc4JXboa12gN5zdlzy5XaR?usp=sharing

How LLMs memorize information! Check out the Starcoder Memorization space by Mithril Security and th...

New Open-Source LLM Surpasses GPT4, 10/19/23, TikTok Instagram

A new LLM focused on data annotation and labeling beats GPT4. It's built from Llama 13B and will be open source. #datascience #machinelearning #semisupervised #refuelai #datalabeling To learn more: Blog post: https://www.refuel.ai/blog-posts/announcing-refuel-llm Refuel LLM is found on their LLM labeling playground: https://labs.refuel.ai/playground

A new LLM focused on data annotation and labeling beats GPT4. It's built from Llama 13B and will be ...

Weekly AI News: Major Tech Highlights, 10/21/23, TikTok Instagram

AI News for the week featuring OpenAI NVIDIA Google Apple Stanford Hugging Face Anthropic and Microsoft. #machinelearning #openai #rajistics #nvidia

AI News for the week featuring OpenAI NVIDIA Google Apple Stanford Hugging Face Anthropic and Micros...

NASA Utilizes Generative AI for Space Manufacturing, 10/28/23, TikTok Instagram

NASA uses generative AI for manufacturing parts for space. It's a great use of generative technology and you can start seeing how it will change engineering in the long run. Check out more of Ryan McClelland's work: Generative Design and Digital Manufacturing: Using AI and robots to build lightweight instruments - https://ntrs.nasa.gov/api/citations/20220012523/downloads/McClelland-Generative%20Design%20SPIE%202022.pdf Ryan McClelland ‚Äì NASA - Generative Design & Digital Manufacturing at NASA Goddard - CDFAM: https://youtu.be/t_h_WmBhRXA?si=5zjqt7DWejyEFXTc NASA Uses AI to Design 3D Printed Parts for Exoplanet Mission | The Cool Parts Show #61 - https://youtu.be/x_Jt1jiQjhA?si=J-dnBzPkh8N9kUvz #generativeai #nasa #rajistics

NASA uses generative AI for manufacturing parts for space. It's a great use of generative technology...

AI Persuasion Tactics and Emotional Avatars, 10/29/23, TikTok Instagram

AI News (Oct 29th 2023) with a focus on AI for persuasion. #openai #meta #google #rajistics Open AI on super persuasion: https://twitter.com/sama/status/1716972815960961174 Lawsuits accuse Facebook Instagram of targeting children - https://red.msudenver.edu/2023/lawsuits-accuse-facebook-instagram-of-targeting-children/ AI firms are paying professional actors $150 an hour to lend emotions to avatars - https://qz.com/ai-firms-are-paying-professional-actors-150-an-hour-to-1850946320 LAOIN collecting emotional data - https://laion.ai/blog/open-empathic/ GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models - https://arxiv.org/abs/2303.10130 Rewarding Chatbots for Real-World Engagement with Millions of Users: https://arxiv.org/abs/2303.06135 Gemini - still waiting: https://www.silverliningsinfo.com/ai/alphabets-new-gemini-ai-could-be-lifeline-google-cloud-growth

AI News (Oct 29th 2023) with a focus on AI for persuasion. #openai #meta #google #rajistics Open AI ...

Visualize Data Using Anscombe's Quartet, 10/30/23, TikTok Instagram

Reminder to visualize your data with one of my favorites Anscombe's quartet #anscombesquartet #datavisualization #datascience #statistics #rajistics

Reminder to visualize your data with one of my favorites Anscombe's quartet #anscombesquartet #datav...

Executive Order on Safe AI Development, 11/1/23, TikTok Instagram

Breaking News: Executive Order on AI Quick video on the main issues there is a lot more in the Order. It is over a 100 pages. #executiveorderai #rajistics #openai #meta Resources: Exec Order: https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/ What the executive order means for openness in AI: https://www.aisnakeoil.com/p/what-the-executive-order-means-for

Breaking News: Executive Order on AI Quick video on the main issues there is a lot more in the Order...

Importance of Ablation Studies in AI, 11/5/23, TikTok Instagram

When analyzing improvements in AI always take a look at the ablation studies. An important part is making sure the compute was held the same between in the ablation studies. In machine learning the more compute a model gets usually it will give you better performance. #datascience #machinelearning #ablation #rajistics Ablations are really important: https://nonint.com/2023/06/25/ablations-are-really-important/ ConvNets Match Vision Transformers at Scale: https://arxiv.org/pdf/2310.16764.pdf

When analyzing improvements in AI always take a look at the ablation studies. An important part is m...

Training Methodology of Whisper v3 Explained, 11/7/23, TikTok Instagram

Diving into how Whisper v3 was trained. OpenAI used a combination of weak learning and pseudo-labeling. #whisper #openai #rajistics Whisper: https://github.com/openai/whisper Whisper v3 release note: https://github.com/openai/whisper/discussions/1762 Robust Speech Recognition via Large-Scale Weak Supervision: https://arxiv.org/abs/2212.04356 wav2vec: Unsupervised Pre-training for Speech Recognition: https://arxiv.org/abs/2006.11477

Diving into how Whisper v3 was trained. OpenAI used a combination of weak learning and pseudo-labeli...

Discussing AGI: Balancing Hype and Rigor, 11/9/23, TikTok Instagram

Trying to talk about AGI in a reasonable manner. There needs to be more hype and more rigor in talking about AGI. The Deepmind paper provides a good discussion of how to think about AGI and where we are now. #deepmind #agi #machinelearning #rajistics Levels of AGI: Operationalizing Progress on the Path to AGI https://arxiv.org/pdf/2311.02462.pdf

Trying to talk about AGI in a reasonable manner. There needs to be more hype and more rigor in talki...

Controlling Language Models using Logit Bias, 11/11/23, TikTok Instagram

Influence the words that come out of a large language model with logit bias. This is often used to make personal ban lists. #openai #largelanguagemodels #logitbias #temperature #rajistics Transformer Lens: https://github.com/neelnanda-io/TransformerLens Car Example: https://www.reddit.com/r/LocalLLaMA/comments/13j3747/tutorial_a_simple_way_to_get_rid_of_as_an_ai/

Influence the words that come out of a large language model with logit bias. This is often used to m...

OpenAI Releases Updates on Developer Day, 11/12/23, TikTok Instagram

OpenAI dropped some big releases for its developer day. Let's catch up on the news in early November 2023. #openai #meta #nvidia #01.ai #h2o.ai #google #china #rajistics #microsoft

OpenAI dropped some big releases for its developer day. Let's catch up on the news in early November...

Visualizing Progress and Errors Effectively, 11/14/23, TikTok Instagram

Waterfall charts that show your progress as well as explaining the error! This is what I like to see when I see a visualization of model results. #datascience #machinelearning #modelreview #datavisualizations OpenAI example: https://www.youtube.com/watch?v=ahnGLM-RC1Y&ab_channel=OpenAI The Pinterest Ads example came from a talk by Aayush Mudgal at the Generative AI MLOps Conference

Waterfall charts that show your progress as well as explaining the error! This is what I like to see...

OpenAI Promotes Non-deterministic LLM Inference, 11/14/23, TikTok

Non-deterministic LLM inference is a deal.OpenAI has started offering it hoping the rest of the providers will also offer it for enterprise applications model reproducibility and reducing model risk make non-determinism very important. #openai #largelanguagemodels #nondeterministic #rajistics OpenAI Documentation: https://platform.openai.com/docs/guides/text-generation/reproducible-outputs OpenAI review: https://cobusgreyling.medium.com/now-you-can-toggle-openai-model-determinism-8b661e02cf98 Video by Google DeepMind: https://www.pexels.com/@googledeepmind/ Text to SQL example from Dr. Ilyas Iyoob Gen AI presentation at MLOps Conference

Non-deterministic LLM inference is a deal.OpenAI has started offering it hoping the rest of the prov...

OpenAI Participates in Red Wedding, 11/18/23, TikTok

OpenAI and Red Wedding

OpenAI Fires Sam Altman: Reactions, 11/18/23, TikTok Instagram LinkedIn

The tech world has so many reactions to OpenAI firing Sam Altman. Here are some very quick reactions. #openai #rajistics

The tech world has so many reactions to OpenAI firing Sam Altman. Here are some very quick reactions...

Enterprises Seek Ownership Over AI Systems, 11/19/23, TikTok YouTube

OpenAI's turmoil this last week will ensure enterprise AI strategies will not depend on OpenAI. It's clear for any valuable AI systems it's firmly in the enterprise's interest to own it (open source) or have an alternative provider. #openai #rajistics #opensource #copilot

OpenAI's turmoil this last week will ensure enterprise AI strategies will not depend on OpenAI. It's...

Optimizing Transformers in Machine Learning, 11/21/23, TikTok Instagram YouTube

Everyone is using transformers! Are you working on optimizing your use? The community has been steadily finding ways to optimize transformers. What is working for you? #machinelearning #transformers #pytorch #rajistics Accelerating Generative AI with PyTorch: Segment Anything Fast: https://pytorch.org/blog/accelerating-generative-ai/ Is Attention all you need: https://www.isattentionallyouneed.com/ Background of Prague Library: https://pixabay.com/illustrations/prague-library-prague-monastery-980732/

Everyone is using transformers! Are you working on optimizing your use? The community has been stead...

Analyzing GPU Scarcity: Insights & Impact, 11/22/23, TikTok Instagram

Are you GPU Poor? A deep dive into the state of GPUs based on the work of Dylan Patel of Semi Analysis. How are you coping with the lack of GPUs? #machinelearning #gpus #nvidia #rajistics The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis: https://www.latent.space/p/semianalysis The AI-first transformation of the Data Center: Insights from Nvidia's Q3 2024 Earnings - https://www.datagravity.dev/p/the-ai-first-transformation-of-the Semi-Analysis: https://www.semianalysis.com/

Are you GPU Poor? A deep dive into the state of GPUs based on the work of Dylan Patel of Semi Analys...

Managing Missing Data in Statistics, 11/24/23, TikTok Instagram

Do you have a missing data story? Missing data happens all the time. Should you just accept it? Drop rows? Use Imputation? or Keep digging? There is often a reason for missing data. Don‚Äôt just jump to dropping rows or using imputation techniques. #dataengineering #statistics #datascience #imputation

Do you have a missing data story? Missing data happens all the time. Should you just accept it? Drop...

Importance of Model Calibration Explored, 11/26/23, TikTok Instagram YouTube

Do you calibrate your models? For many types of models you may need to calibrate them. This video reminds us of the importance of calibration. To dig deeper check out platt scaling or isotonic regression. #machinelearning #datascience #statistics #rajistics #colorcalibration

Do you calibrate your models? For many types of models you may need to calibrate them. This video re...

Understanding OpenAI's Q*, GPT-4 Limitations, 11/28/23, TikTok Instagram

Q* from OpenAI is getting the hype but let's focus on the basics of their organization and the limitations of GPT-4 around planning. This video covers some of the concepts to dig deeper check out my earlier videos from August 2023: Planning with LLMs: https://www.tiktok.com/@rajistics/video/7266582668555423019 @Rajiv Shah | data science & AI Block World: https://www.tiktok.com/@rajistics/video/7266866822639635754 @Rajiv Shah | data science & AI #largelanguagemodels #aiplanning #gpt4 #qstar #openai #rajistics

Q* from OpenAI is getting the hype but let's focus on the basics of their organization and the limit...

Mastering Prompting in Special-Purpose Models, 11/29/23, TikTok Instagram YouTube

The power of prompting! How to use a general purpose model to be a special purpose fine tuned model. It's really important to learn good prompting strategies. #largelanguagemodels #promptengineering #openai #gpt4 #medpalm #rajistics 5 Pillars of Prompting: Prompt Engineering: From Words to Art and Copy https://www.saxifrage.xyz/post/prompt-engineering Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine https://arxiv.org/pdf/2311.16452.pdf

The power of prompting! How to use a general purpose model to be a special purpose fine tuned model....

Categorical Data Processing: Encoding Techniques, 12/1/23, TikTok Instagram

Working with Categorical data using ordinal one hot (dummy) and target encoding. Do you have your own favorite approach? And ChatGPT tells me that gender has many categories. #datascience #statistics #analytics #featureengineering

Working with Categorical data using ordinal one hot (dummy) and target encoding. Do you have your ow...

Regularization Prevents Overfitting in Models, 12/3/23, TikTok

Regularization is a technique to keep your model from overfitting. It's widely used in machine learning. #datascience #statistics #regularization

Regularization is a technique to keep your model from overfitting. It's widely used in machine learn...

Google Gemini Dropped: TPUs Adjusted, 12/6/23, TikTok Instagram YouTube LinkedIn

Google dropped Gemini. Let's talk about the different sizes tweaked benchmarks multimodal trained on TPUs and how it's not that exciting. #gpt4 #gemini #google #rajistics

Google dropped Gemini. Let's talk about the different sizes tweaked benchmarks multimodal trained on...

Meta Introduces Llama Guard for AI Content Moderation, 12/8/23, TikTok Instagram YouTube

Meta released Llama Guard for content moderation. It looks to be effective and very adaptable. This is part of their Purple Llama project around trust for generative AI. #contentmoderation #largelanguagemodels #meta #rajistics #llamaguard Model: https://huggingface.co/meta-llama/LlamaGuard-7b Blog: https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/ https://huggingface.co/meta-llama/LlamaGuard-7b Prompts: https://twitter.com/_philschmid/status/1732817699862966654?s=20

Meta released Llama Guard for content moderation. It looks to be effective and very adaptable. This ...

New Mixtral Model Pre-Released Friday, 12/9/23, TikTok Instagram YouTube

Mixtral is a new model using a mixture of experts (MoE) approach. It consists of 8x7B mistral models. It was pre-released on Friday look for more details to come. #largelanguagemodels #mixtral #mistral #rajistics A version of the Mixtral model is here: https://huggingface.co/DiscoResearch/DiscoLM-mixtral-8x7b-v2 More about MoE approach check out MegaBlocks: Efficient Sparse Training with Mixture-of-Experts: https://arxiv.org/abs/2211.15841

Mixtral is a new model using a mixture of experts (MoE) approach. It consists of 8x7B mistral models...

Explore Attention Alternatives and State Space Models, 12/14/23, TikTok

To dig deeper go watch Sasha Rush's video on alternatives to attention: https://youtu.be/dKJEpOtVgXc?si=Lx94-51PsjGF-YZT Dig deeper with the Annotated S4 paper into state space models: https://srush.github.io/annotated-s4/ Check out Mamba: https://github.com/state-spaces/mamba #rajistics

To dig deeper go watch Sasha Rush's video on alternatives to attention: https://youtu.be/dKJEpOtVgXc...

Learning Data Science Through Practice, 12/17/23, TikTok

The best way to learning data science is working with data. You don‚Äôt need to spend money on courses or books. Spending time doing useful projects. #machinelearning #datascience

The best way to learning data science is working with data. You don‚Äôt need to spend money on cours...

Automating Data Labeling with Machine Learning, 12/20/23, TikTok Instagram

The skit addresses the challenge of acquiring large volumes of labeled data for machine learning projects. The video focuses on using machine learning models for automating the labeling process. This approach is highlighted with the mention of MLFlow a platform that now supports using models as judges for data labeling. #datalabeling #mlflow #rajistics More details: Automation of Data Labeling: Emphasizes the shift from manual to automated data labeling highlighting efficiency and cost-effectiveness. Integration with MLFlow: Showcases the practical application of recent advancements in MLFlow that facilitate the use of machine learning models for data evaluation and labeling. Model Output and Justification: The discussion underscores the importance of not just the labeling output but also the accompanying justifications providing insights into the model's decision-making process. Accuracy and Bias Consideration: Highlights the correlation between model-generated labels and human-labeled data while acknowledging the potential biases inherent in machine learning models.

The skit addresses the challenge of acquiring large volumes of labeled data for machine learning pro...

2023 AI Advancements: Detailed LinkedIn Post, 12/22/23, TikTok Instagram

If you want more details on the biggest advancements in AI for 2023 then find me on LinkedIn or Threads where I have a detailed post with all the links. #2023 #ai #predictions #rajistics

If you want more details on the biggest advancements in AI for 2023 then find me on LinkedIn or Thre...

Diverse Future for Large Language Models, 12/24/23, TikTok Instagram

The future will be many different LLMs some open source and some proprietary. Other like Yann Lecun think differently. Yann's thread: https://www.threads.net/@yannlecun/post/C1NQlcmvqxA #largelanguagemodels #rajistics

The future will be many different LLMs some open source and some proprietary. Other like Yann Lecun ...

Understanding Sampling Bias Shortcomings, 12/25/23, TikTok Instagram YouTube

Sampling bias

DINOv2: Self-Supervised Computer Vision Model, 12/27/23, TikTok Instagram YouTube

DINOv2 is a self-supervised machine learning model for computer vision. It can be used for a variety of image tasks like image classification object detection and video understanding without any fine tuning. To learn more check out Paper: https://arxiv.org/pdf/2304.07193.pdf Github: https://github.com/facebookresearch/dinov2 See the post from Yann on my 2023 AI Advancements post: https://www.threads.net/@rajistics/post/C1H6pe9gXLz

DINOv2 is a self-supervised machine learning model for computer vision. It can be used for a variety...

Challenging Assumptions for Better Data Analysis, 12/30/23, TikTok YouTube

Don't be afraid to challenge established models and assumptions! Often spending time with the data can give you new insights. One common limitation is the dependence on averages in lots of models. Often data is distributed in other ways. #datadistributions #dataanalysis #rajistics Paper: New insights into US flood vulnerability revealed from flood insurance big data - https://www.nature.com/articles/s41467-020-15264-2

Don't be afraid to challenge established models and assumptions! Often spending time with the data c...

Alternatives to K-Means Clustering Methods, 12/31/23, TikTok YouTube

Some alternatives to clustering with k-means. This skit was inspired by the examples in Schubert paper on stop using the elbow criterion for kmeans. Any other clustering fails out there? Covering: Normalization Guassian mixture models DBSCAN HDBSCAN #datascience #statistics #machinelearning #kmeans #clustering #rajistics Stop using the elbow criterion for k-means and how to choose the number of clusters instead: https://arxiv.org/abs/2212.12189 This is repost from last year

Some alternatives to clustering with k-means. This skit was inspired by the examples in Schubert pap...

New York Times Sues OpenAI: Copyright Implications, 1/1/24, TikTok Instagram YouTube

The New York Times recently filed a lawsuit against OpenAI. This is another of many copyright lawsuits against AI companies. While everyone is using the NYT data it's not going to be easy to get copyright to substantially support the claims of NYT. I am skeptical of any significant change. Some other reminders: - Many of these copyright cases against AI - Apple OpenAI and others are negotiating agreements with publishing companies - Microsoft and the tech industry have survived many of these types of cases and still thrived - A lawsuit can take a long time to wind through the courts - Today is an important day in copyright with Mickey Mouse coming into the public domain (but is still protected by trademark) - As content providers recognize the value of their content this will hurt the open source movement with less good data #openai #newyorktimes #copyright #mickeymouse #disneyplusvoices

The New York Times recently filed a lawsuit against OpenAI. This is another of many copyright lawsui...

2023 Favorite Data Visualizations Highlighted, 1/3/24, TikTok YouTube

Let's learn from the best! Feel free to share your favorites. My list comes from the Data Vis Dispatch list of favorite visualizations for 2023. Data Vis Dispatch: https://blog.datawrapper.de/data-vis-dispatch-december-19-2023/ #datavisualizations #rajistics

Let's learn from the best! Feel free to share your favorites. My list comes from the Data Vis Dispat...

Teamwork Inspired From Timmel's Joke, 1/5/24, TikTok

What would you have done? Working with teammates and inspired / copied from Nathan Timmel's joke on canned pumpkin. #teamwork #weaponizedincompetence #rajistics

What would you have done? Working with teammates and inspired / copied from Nathan Timmel's joke on ...

Cultrix Tops Leaderboard with Generative AI, 1/5/24, TikTok Instagram YouTube

Time to get started with Generative AI. Look at how Cultrix got their model to the top of the OpenLLM Leaderboard. You can do this too! Cultrix's model: https://huggingface.co/CultriX/MistralTrix-v1 Fine Tune Mistral: https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac Generative AI course: https://github.com/mlabonne/llm-course/

Time to get started with Generative AI. Look at how Cultrix got their model to the top of the OpenLL...

Useful Statistical Concepts in Data Science, 1/7/24, TikTok Instagram YouTube

Statistics sounds heavy but a lot of concepts are very useful and can save you a lot of effort. This video is reminder of the many ways we use statistical concepts like random in data science. #statistics #random #rajistics

Statistics sounds heavy but a lot of concepts are very useful and can save you a lot of effort. This...

Scaling Laws: DeepMind Counteracts OpenAI's Wisdom, 1/9/24, TikTok

Repost but scaling laws are still very important. Scaling laws help us figure out how to manage the amount of training data versus the model size. DeepMind showed Chinchilla by using more data you can use a smaller model. This went against the known wisdom from OpenAI‚Äôs research. This is a big deal because lots of resources are spent on building those models. #datascience #machinelearning #largelanguagemodels #openai #deepmind #nvidia #microsoft #azure #huggingface #chatgpt

Repost but scaling laws are still very important. Scaling laws help us figure out how to manage the ...

DataMapPlot and BerTopic for Data Clustering, 1/9/24, TikTok Instagram YouTube

Like beautiful plots of data maps? Check out DataMapPlot from Leland McInnes. To make the best use of this you will need to have your data clustered. If you aren't sure where to start BerTopic is my suggestion. DataMapPlot: https://github.com/TutteInstitute/datamapplot BerTopic: https://maartengr.github.io/BERTopic/index.html #datamap #topicclustering #rajistics

Like beautiful plots of data maps? Check out DataMapPlot from Leland McInnes. To make the best use o...

Understanding LLMs' Sensitivity in Prompt Design, 1/11/24, TikTok Instagram YouTube LinkedIn

Prompt sensitivity is a thing. This video covers how changes in formatting the persuasion used in prompts and prompt injection attacks are all crucial considerations in working with LLMs. QUANTIFYING LANGUAGE MODELS‚Äô SENSITIVITY TO SPURIOUS FEATURES IN PROMPT DESIGN or: How I learned to start worrying about prompt formatting: https://arxiv.org/pdf/2310.11324.pdf Anthropic evaluation: https://www.anthropic.com/index/evaluating-ai-systems How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs - https://chats-lab.github.io/persuasive_jailbreaker/ #largelanguagemodels #openai #anthropic #promptinjection #prompting #rajistics

Prompt sensitivity is a thing. This video covers how changes in formatting the persuasion used in pr...

Importance of Visualizing Data Highlighted, 1/14/24, TikTok YouTube

Anscombe's quartet‚Äù and the ‚Äúdatasaurus dozen‚Äù remind us of the importance of visualizing data.

Anscombe's quartet‚Äù and the ‚Äúdatasaurus dozen‚Äù remind us of the importance of visualizing data...

Security Risks and Solutions in LLMs, 1/18/24, TikTok Instagram YouTube

LLMs have a lot of security issues. From prompt injection attacks extraction of training data data poisoning and even GPU based attacks. However most of these can be managed so be aware but not too scared: Some relevant papers: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 Extracting Training Data https://www.usenix.org/system/files/sec21-carlini-extracting.pdf Poisoning Web-Scale Training Datasets is Practical https://arxiv.org/pdf/2302.10149.pdf https://www.youtube.com/watch?v=la7_sgp0iKY&ab_channel=BlackHat #largelanguagemodels #promptinjection #datapoisoning #security #rajistics

LLMs have a lot of security issues. From prompt injection attacks extraction of training data data p...

Undervalued Open Source Software's Worth, 1/19/24, TikTok YouTube

Don't let people overlook open source software. It might be free but it's priceless. The Value of Open Source Software at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4693148 #opensource #rajistics

Don't let people overlook open source software. It might be free but it's priceless. The Value of Op...

Axolotl Simplifies Fine-Tuning AI Models, 1/22/24, TikTok LinkedIn

Axolotl provides a declarative approach to fine tuning large language models. It's very easy to get started with and much easier for folks new to AI/ML. Axolotl: https://github.com/OpenAccess-AI-Collective/axolotl

Axolotl provides a declarative approach to fine tuning large language models. It's very easy to get ...

Improving Model Explainability with Synthetic Data, 1/25/24, TikTok Instagram YouTube

When you build a synthetic dataset you know where the noise is and where the signal is. This lets you better assess techniques for feature selection and model explainability. Try it out sometime. #datascience #machinelearning #syntheticdata #explainability #rajistics

When you build a synthetic dataset you know where the noise is and where the signal is. This lets yo...

DPO: Notable Advance in AI Efficiency, 1/26/24, TikTok Instagram YouTube

Direct Preference Optimization is one of the most significant advances in AI over the last six months. It provides a simpler and more efficient way to align a model's preferences. You can try out in packages like TRL. Direct Preference Optimization (DPO) - A Simplified Explanation: https://medium.com/@joaolages/direct-preference-optimization-dpo-622fc1f18707 Direct Preference Optimization: Your Language Model is Secretly a Reward Model - https://arxiv.org/pdf/2305.18290.pdf DPO Trainer: https://huggingface.co/docs/trl/main/en/dpo_trainer

Direct Preference Optimization is one of the most significant advances in AI over the last six month...

OpenAI Models and Embedding Advances, 1/28/24, TikTok YouTube

OpenAI's new models look great and incorporate the latest advances. But don't forget about the open source as well as some tips for thinking about embeddings. Massive Text Embedding Benchmark (MTEB) Leaderboard - https://huggingface.co/spaces/mteb/leaderboard MatFormer: Nested Transformer for Elastic Inference - https://arxiv.org/pdf/2310.07707.pdf Kunal's math: https://twitter.com/TangriKunal/status/1748114153833660766 Background: davidfoxx - https://pixabay.com/videos/lake-peaceful-dock-boat-birds-fog-172007/

OpenAI's new models look great and incorporate the latest advances. But don't forget about the open ...

Code LLama Dropped, Copilot Quality Concerns, 1/30/24, TikTok Instagram YouTube

Code LLama 70B dropped but we also have some other research on building and using copilots that are were also worthy. Code Llama: https://ai.meta.com/blog/code-llama-large-language-model-coding Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality - https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality Building Your Own Product Copilot: Challenges Opportunities and Needs - https://arxiv.org/pdf/2312.14231.pdf Background: https://pixabay.com/videos/dna-science-biology-laboratory-197931/

Code LLama 70B dropped but we also have some other research on building and using copilots that are ...

Exploring Understudied Large Language Models, 1/31/24, TikTok Instagram LinkedIn

Deep dive video on using explanations that could out of large language models. This is something that is understudied but I find it quite useful. Running through a quick summary of the other relevant studies I found. You can get the slides and YT link at: https://github.com/rajshah4/LLM-Evaluation

Deep dive video on using explanations that could out of large language models. This is something tha...

Advancement in Truly Open-Source AI, 2/2/24, TikTok Instagram YouTube

Open is thrown about a lot in the AI community. This week Nomic and Allen AI remind us what it takes to build truly open-source AI models. They shared the training data and methods along with the models. This is a big deal because scrutiny of the training process is valuable for many reasons. Dolma : an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research - https://arxiv.org/pdf/2402.00159.pdf Nomic Embed: https://blog.nomic.ai/posts/nomic-embed-text-v1 Open Language Models (OLMos) and the LLM landscape: https://www.interconnects.ai/p/olmo

Open is thrown about a lot in the AI community. This week Nomic and Allen AI remind us what it takes...

Cautious Approach to New Time-Series Models, 2/4/24, TikTok Instagram YouTube

Be skeptical of new models like TimeFM from Google (but still listen). For many reasons deep learning models do not work well for time series problems. Most time series practioners are not looking for deep learning time series models. TimeFM -- A decoder-only foundation model for time-series forecasting: https://arxiv.org/pdf/2310.10688.pdf Tranformers_Are_What_You_Dont_Need: https://github.com/valeman/Tranformers_Are_What_You_Dont_Need Nixtla TimeGPT: https://docs.nixtla.io/docs/timegpt_quickstart #timeseries #timefm #google #rajistics #deeplearningtechnique

Be skeptical of new models like TimeFM from Google (but still listen). For many reasons deep learnin...

Challenges and Progress in AI Travel Planning, 2/7/24, TikTok Instagram YouTube

Reasoning and planning with LLMs are difficult as trip planning shows. Hopefully now we have a benchmark teams will make progress on this real world task. Travelplanner site: https://osu-nlp-group.github.io/TravelPlanner/ Travelplanner paper: https://osu-nlp-group.github.io/TravelPlanner/ #largelanguagemodels #aiplanning #travelplanner #rajistics

Reasoning and planning with LLMs are difficult as trip planning shows. Hopefully now we have a bench...

Consider Dimensionality in Feature Selection, 2/11/24, TikTok Instagram

Curse of dimensionality reminds us to think carefully about feature selection. More isn‚Äôt always better. Use a feature selection curve. #datascience #machinelearning #curseofdimensionality #featureselection Curse of dimensionality reminds us to think carefully about feature selection. More isn‚Äôt always better. Use a feature selection curve. #datascience #machinelearning #curseofdimensionality #featureselection #rajistics

Curse of dimensionality reminds us to think carefully about feature selection. More isn‚Äôt always b...

Effective Variables in Auto Insurance Models, 2/13/24, TikTok Instagram

The feature or variables in auto insurance models. Learn from insurance good features can give you a lot of predictive power. #datascience #machinelearning #autoinsurance #acturialscience

The feature or variables in auto insurance models. Learn from insurance good features can give you a...

AI Upgrades: SORA, Gemini 1.5, V-JEPA, 2/17/24, TikTok Instagram YouTube

Great week for AI! OpenAI dropped SORA for text to video Google with Gemini 1.5 Pro with a longer context length and Meta released V-JEPA with a self-supervised approach for videos. Background by tommyvideo - https://pixabay.com/videos/love-hearts-valentine-symbol-shape-5240/ #openai #meta #google #rajistics #sora #gemini

Great week for AI! OpenAI dropped SORA for text to video Google with Gemini 1.5 Pro with a longer co...

Evolution of Data Science Simplified, 2/18/24, TikTok Instagram

The history of data science. I have since learned to make videos shorter and punchier.

Utilizing Recursive Feature Elimination and FIRE, 2/18/24, TikTok Instagram YouTube

One of my favorite methods for feature selection is recursive feature elimination. It's very easy to do and a starting data scientist can code this up. This is a starting point feature selection is a much larger topic. I did a more sophisticated version of this at DataRobot that I called Feature Importance Rank Ensembling (FIRE) - https://www.datarobot.com/blog/using-feature-importance-rank-ensembling-fire-for-advanced-feature-selection/ Background: Oleksandr P at https://www.pexels.com/video/flaming-tiki-torches-on-the-beach-7357617/ #datascience #featureselection #rajistics

One of my favorite methods for feature selection is recursive feature elimination. It's very easy to...

Improving LLMs with Synthetic Data Tactics, 2/22/24, TikTok YouTube

Let's up your LLM game by going over the use of prompting strategies fine tuning and using synthetic datasets. This was motivated by some great work from Predibase and Hugging Face. Synthetic data: save money time and carbon with open source - https://huggingface.co/blog/synthetic-data-save-costs LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4 - https://predibase.com/blog/lora-land-fine-tuned-open-source-llms-that-outperform-gpt-4 Background by Matthias_Groeneveld

Let's up your LLM game by going over the use of prompting strategies fine tuning and using synthetic...

Understanding Geospatial Analytics with H3, 2/23/24, TikTok Instagram YouTube

A little taste of geospatial analytics. Considering spatial information can be very valuable for data science and machine learning. It‚Äôs good to understand how spatial data is stored and analyzed. H3 is an emerging standard for doing your analytics on location data. H3: Simplifying the World's Map - https://h3-snow.streamlit.app/ Uber H3: https://www.uber.com/blog/h3/ Background: https://www.youtube.com/watch?v=1d56UJKKk8Y&ab_channel=CARTO #geospatial #h3 #uber #rajistics #carto

A little taste of geospatial analytics. Considering spatial information can be very valuable for dat...

Importance of Feature Engineering Skill, 2/24/24, TikTok Instagram YouTube

Feature engineering is an important part of the machine learning lifecycle. It‚Äôs part art and skill. It takes time to learn and the best data scientists are good at feature engineering. Data science competitions like Kaggle often reward feature engineering skills. Tensorflow Playground: https://playground.tensorflow.org/ (I have older videos on the tensorflow playground) Background: bilalashrafmzon - https://pixabay.com/videos/animated-wallpaper-wallpaper-70960/ AI Characters with AI Parrot

Feature engineering is an important part of the machine learning lifecycle. It‚Äôs part art and skil...

XGBoost 2.0 Update Brings New Features, 2/25/24, TikTok Instagram YouTube LinkedIn

XGBoost 2.0 is out with some great new features including support for multi-target trees with vector-leaf outputs and learning to rank problems. XGBoost is widely used for solving tabular problems. (The update was released a few months ago but that still new in the world of gradient boost machines) XGBoost: https://github.com/dmlc/xgboost Background Video by Trippy Clicker: https://www.pexels.com/video/panning-shot-of-the-sea-at-sunset-6202759/ AI Characters by Parrot AI

XGBoost 2.0 is out with some great new features including support for multi-target trees with vector...

RFM Method in Customer Lifetime Value, 2/29/24, TikTok Instagram YouTube

Customer lifetime value is a common data science use case. There are many ways to calculate this but here I introduce the classic RFM method. In Part 2 I will show a machine learning alternative. #datascience #machinelearning #rfm #customerlifetimevalue #marketinganalytics

Customer lifetime value is a common data science use case. There are many ways to calculate this but...

Getting ComfyUI Running for Cool Images, 2/29/24, TikTok Instagram YouTube

The hardest step is getting ComfyUI running on your computer (you need a GPU). Go do it! Then you can create the coolest images using stable diffusion. To get comfy UI check out: https://github.com/comfyanonymous/ComfyUI Video to get started: https://www.youtube.com/watch?v=oZY4Iem5Oz4&ab_channel=Grockster Instant ID on ComfyUI: https://www.youtube.com/watch?v=wMLiGhogOPE&ab_channel=LatentVision Also ComfyUI web - but I haven‚Äôt tried this: https://comfyuiweb.com/ #comfyui #stablediffusion #rajistics

The hardest step is getting ComfyUI running on your computer (you need a GPU). Go do it! Then you ca...

Effective Forecasting: Baselines to Machine Learning, 3/3/24, TikTok Instagram YouTube

Solid forecasting advice and proved out in the M5 forecasting competition. Start with simple baselines and statistical approaches and then add machine learning. As you start improving make sure you think about cross validation and combining different approaches. Time series forecasting is very challenging. For more details on the competitioncheck out the pape : M5 accuracy competition: Results findings and conclusions - https://www.sciencedirect.com/science/article/pii/S0169207021001874 #forecasting #machinelearning #rajistics #parrotaistyle

Solid forecasting advice and proved out in the M5 forecasting competition. Start with simple baselin...

Calculating Customer Lifetime Value: Part 1, 3/5/24, TikTok Instagram YouTube

Customer lifetime value is a common data science use case. There are many ways to calculate this but here I show how a data scientist would setup the problem. In Part 1 I will show the classic RFM approach. #datascience #machinelearning #rfm #customerlifetimevalue #marketinganalyticssummit

Customer lifetime value is a common data science use case. There are many ways to calculate this but...

Decoding Viral Reactions to Claude 3, 3/7/24, TikTok Instagram YouTube

Claude 3 and lots of unbelievable claims. Let‚Äôs walk through some of the more viral reactions and explain what is going on. We also need to pay attention to the training data for these models and remember they are trained to act very helpful and smart. Claude 3 blog post: https://www.anthropic.com/news/claude-3-family Awareness in needle test: https://twitter.com/alexalbert__/status/1764722513014329620 Anthropic translating low resource language: https://twitter.com/hahahahohohe/status/1765088860592394250 #anthropic #claude #aihype #rajistics

Claude 3 and lots of unbelievable claims. Let‚Äôs walk through some of the more viral reactions and ...

Exploring Alternatives in Machine Learning, 3/9/24, TikTok Instagram YouTube

Interpretable models offer a great alternative to traditional machine learning algorithms. Generalized Additive Models like GA2M Rulefit and Scorecards are just a few of the approaches available. To learn more check out the resources: Interpretable Models for Machine Learning: https://towardsdatascience.com/the-art-of-sprezzatura-for-machine-learning-e2494c0db727 Imodels: https://github.com/csinva/imodels InterpretML: https://interpret.ml/ Background dedicated to Britt

Interpretable models offer a great alternative to traditional machine learning algorithms. Generaliz...

Knowledge Distillation: Building Efficient Models, 3/10/24, TikTok Instagram YouTube

Knowledge distillation is a useful technique to build smaller high-performing models. DistilBERT is a great example of a widely used model trained using knowledge distillation. Resources: Distilling the Knowledge in a Neural Network - https://arxiv.org/pdf/1503.02531.pdf DistilBERT: https://arxiv.org/abs/1910.01108 Background by Roberta keiko Kitahara Santana: https://unsplash.com/photos/brown-cardboard-box-near-gray-tanks-RfL3l-I1zhc

Knowledge distillation is a useful technique to build smaller high-performing models. DistilBERT is ...

Solving Benchmarks Essential for Automation, 3/15/24, TikTok

WorkArena and WebArena are some newer real benchmarks for real-world tasks. To build wider automation, itÔøΩs going to be essential to solve these and more demanding benchmarks. Despite this, people often overestimate how much work AI will displace in the short term. WebArena: https://webarena.dev/ WorkArena: https://servicenow.github.io/WorkArena/

WorkArena and WebArena are some newer real benchmarks for real-world tasks. To build wider automatio...

AI Models Navigate Diverse Data Types, 3/16/24, TikTok Instagram YouTube

AI works with various data types: tabular unstructured and semi-structured like JSON. While tabular data is most prevalent in enterprises Generative AI models primarily focus on unstructured and semi-structured data. #tabular #unstructured #json #semistructured #rajistics

AI works with various data types: tabular unstructured and semi-structured like JSON. While tabular ...

Advancements in Llama.cpp Control Vectors, 3/17/24, TikTok Instagram YouTube LinkedIn

Control vectors are getting more widely supported most recently in Llama.cpp. It‚Äôs another useful technique alongside prompting fine tuning and logit bias. Resources: What is Representation Engineering: https://vgel.me/posts/representation-engineering/ Representation Engineering: A Top-Down Approach to AI Transparency - https://arxiv.org/abs/2310.01405 A library for making RepE control vectors: https://github.com/vgel/repeng/tree/main

Control vectors are getting more widely supported most recently in Llama.cpp. It‚Äôs another useful ...

Multimodal Pre-training, Enhanced Embeddings and RAG 2.0, 3/19/24, TikTok

Apple's MM1: Methods Analysis & Insights from Multimodal LLM Pre-training - https://arxiv.org/pdf/2403.09611.pdf Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets: https://txt.cohere.com/int8-binary-embeddings/ Contextual AI - RAG 2.0: https://contextual.ai/introducing-rag #rajistics

Apple's MM1: Methods Analysis & Insights from Multimodal LLM Pre-training - https://arxiv.org/pdf/24...

Data Visualization Tips on This Day, 3/22/24, TikTok

#onthisday Data visualization tips #datascience #dataviz #analytics #datavisualization

Optimizing Snowflake's AI Text-to-SQL System, 3/25/24, TikTok Instagram

To build generative AI models like the text-to-SQL system by Snowflake it is important to create a realistic and challenging training dataset rather than relying on academic benchmarks that may be overly simplistic. Customized evaluation metrics that go beyond simple similarity scores but avoid the strictness of execution-based metrics such as using language models for partial credit scoring are valuable. Prompting strategies that provide relevant context like table metadata can improve performance. Finally using a strong base model like Mistral Large can push the boundaries of what is achievable allowing Snowflake's system to outperform even GPT-4 on the text-to-SQL task. Read the series: Part 1: All evaluation data sets are wrong some are useful. https://medium.com/snowflake/inside-snowflake-building-the-most-powerful-sql-llm-in-the-world-95114114aab9 Part 2: Expanding Evaluation to be user-centric...with LLMs https://medium.com/snowflake/inside-snowflake-building-the-most-powerful-sql-llm-in-the-world-1a33b3ee0d37 Part 3: Retrieve Prompt Generate https://medium.com/snowflake/inside-snowflake-building-the-most-powerful-sql-llm-in-the-world-05490f0d1ac7 Part 4. Mistral & Snowflake: The New Frontier in SQL Copilot Products https://medium.com/snowflake/mistral-snowflake-the-new-frontier-in-sql-copilot-products-f71b8a939899

To build generative AI models like the text-to-SQL system by Snowflake it is important to create a r...

Updating Training Methods for Language Models, 3/27/24, TikTok Instagram

This is a year old but still holds up pretty well. The big difference is you may want to use TRL instead of PEFT for the training. But the concepts around efficiently training a large language model using PEFT and LoRA hold up. #datascience #machinelearning #largelanguagemodels #flant5 #peft #LoRA #finetuning YouTube Video: https://youtu.be/YKCtbIJC3kQ?si=PL3A1mgMXmuVbZqe Blog Post: https://www.philschmid.de/fine-tune-flan-t5-peft

This is a year old but still holds up pretty well. The big difference is you may want to use TRL ins...

Comparing Text Generation Techniques: Beam Search, 3/30/24, TikTok Instagram YouTube

Beam search is an alternative way for LLMs to generate text. Let's walk through how beam search compares to greedy search. Alternatives include using temperature or Top K sampling. Resources: Beam Search Visualizer: https://huggingface.co/spaces/m-ric/beam_search_visualizer How to generate text: using different decoding methods for language generation with Transformers: https://huggingface.co/blog/how-to-generate #largelanguagemodels #beamsearch #textgeneration #rajistics

Beam search is an alternative way for LLMs to generate text. Let's walk through how beam search comp...

GPT4's Planning Abilities in Experiment, 3/30/24, TikTok Instagram

An experiment studying how well GPT4 can plan by using Block World and Mystery World. Now updated with results from Claude and Gemini Pro. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark) - https://arxiv.org/abs/2305.15771 #largelanguagemodels #gpt4 #aiplanning #blockworld #mysteryworld #claude #geminiprodfl

An experiment studying how well GPT4 can plan by using Block World and Mystery World. Now updated wi...

Template by

Manu Arora