Data Science Portfolio - Kaggle Datasets & AI Projects | Artificial Intelligence
37.7K subscribers
287 photos
76 files
342 links
Free Datasets For Data Science Projects & Portfolio

Buy ads: https://telega.io/c/DataPortfolio

For Promotions/ads: @coderfun @love_data
Download Telegram
The best way to learn data analytics skills is to:

1. Watch a tutorial

2. Immediately practice what you just learned

3. Do projects to apply your learning to real-life applications

If you only watch videos and never practice, you wonโ€™t retain any of your teaching.

If you never apply your learning with projects, you wonโ€™t be able to solve problems on the job. (You also will have a much harder time attracting recruiters without a recruiter.)
โค8
๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐—™๐—ฅ๐—˜๐—˜ (๐—ก๐—ผ ๐—ฆ๐˜๐—ฟ๐—ถ๐—ป๐—ด๐˜€ ๐—”๐˜๐˜๐—ฎ๐—ฐ๐—ต๐—ฒ๐—ฑ)

๐—ก๐—ผ ๐—ณ๐—ฎ๐—ป๐—ฐ๐˜† ๐—ฐ๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ๐˜€, ๐—ป๐—ผ ๐—ฐ๐—ผ๐—ป๐—ฑ๐—ถ๐˜๐—ถ๐—ผ๐—ป๐˜€, ๐—ท๐˜‚๐˜€๐˜ ๐—ฝ๐˜‚๐—ฟ๐—ฒ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด.

๐—›๐—ฒ๐—ฟ๐—ฒโ€™๐˜€ ๐—ต๐—ผ๐˜„ ๐˜๐—ผ ๐—ฏ๐—ฒ๐—ฐ๐—ผ๐—บ๐—ฒ ๐—ฎ ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜๐—ถ๐˜€๐˜ ๐—ณ๐—ผ๐—ฟ ๐—™๐—ฅ๐—˜๐—˜:

1๏ธโƒฃ Python Programming for Data Science โ†’ Harvardโ€™s CS50P
The best intro to Python for absolute beginners:
โ†ฌ Covers loops, data structures, and practical exercises.
โ†ฌ Designed to help you build foundational coding skills.

Link: https://cs50.harvard.edu/python/

https://xn--r1a.website/datasciencefun

2๏ธโƒฃ Statistics & Probability โ†’ Khan Academy
Want to master probability, distributions, and hypothesis testing? This is where to start:
โ†ฌ Clear, beginner-friendly videos.
โ†ฌ Exercises to test your skills.

Link: https://www.khanacademy.org/math/statistics-probability

https://whatsapp.com/channel/0029Vat3Dc4KAwEcfFbNnZ3O

3๏ธโƒฃ Linear Algebra for Data Science โ†’ 3Blue1Brown
โ†ฌ Learn about matrices, vectors, and transformations.
โ†ฌ Essential for machine learning models.

Link: https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9KzVk3AjplI5PYPxkUr

4๏ธโƒฃ SQL Basics โ†’ Mode Analytics
SQL is the backbone of data manipulation. This tutorial covers:
โ†ฌ Writing queries, joins, and filtering data.
โ†ฌ Real-world datasets to practice.

Link: https://mode.com/sql-tutorial

https://whatsapp.com/channel/0029VanC5rODzgT6TiTGoa1v

5๏ธโƒฃ Data Visualization โ†’ freeCodeCamp
Learn to create stunning visualizations using Python libraries:
โ†ฌ Covers Matplotlib, Seaborn, and Plotly.
โ†ฌ Step-by-step projects included.

Link: https://www.youtube.com/watch?v=JLzTJhC2DZg

https://whatsapp.com/channel/0029VaxaFzoEQIaujB31SO34

6๏ธโƒฃ Machine Learning Basics โ†’ Googleโ€™s Machine Learning Crash Course
An in-depth introduction to machine learning for beginners:
โ†ฌ Learn supervised and unsupervised learning.
โ†ฌ Hands-on coding with TensorFlow.

Link: https://developers.google.com/machine-learning/crash-course

7๏ธโƒฃ Deep Learning โ†’ Fast.aiโ€™s Free Course
Fast.ai makes deep learning easy and accessible:
โ†ฌ Build neural networks with PyTorch.
โ†ฌ Learn by coding real projects.

Link: https://course.fast.ai/

8๏ธโƒฃ Data Science Projects โ†’ Kaggle
โ†ฌ Compete in challenges to practice your skills.
โ†ฌ Great way to build your portfolio.

Link: https://www.kaggle.com/
โค11๐Ÿ”ฅ2
๐Ÿ”ฐ Python program to convert text to speech
โค8
โš ๏ธ Mistakes Beginners Repeat for Years

โŒ Ignoring fundamentals
โŒ Copy-pasting without understanding
โŒ Overusing frameworks
โŒ Avoiding debugging
โŒ Skipping tests
โŒ Fear of refactoring

React ๐Ÿงก if you want more of this type of content

#techinfo
โค15๐Ÿ”ฅ1
โœ… GitHub Profile Tips for Data Analysts ๐ŸŒ๐Ÿ’ผ

Your GitHub is more than code โ€” itโ€™s your digital resume. Here's how to make it stand out:

1๏ธโƒฃ Clean README (Profile)
โ€ข Add your name, title & tools
โ€ข Short about section
โ€ข Include: skills, top projects, certificates, contact
โœ… Example:
โ€œHi, Iโ€™m Rahul โ€“ a Data Analyst skilled in SQL, Python & Power BI.โ€

2๏ธโƒฃ Pin Your Best Projects
โ€ข Show 3โ€“6 strong repos
โ€ข Add clear README for each project:
- What it does
- Tools used
- Screenshots or demo links
โœ… Bonus: Include real data or visuals

3๏ธโƒฃ Use Commits & Contributions
โ€ข Contribute regularly
โ€ข Avoid empty profiles
โœ… Daily commits > 1 big push once a month

4๏ธโƒฃ Upload Resume Projects
โ€ข Excel dashboards
โ€ข SQL queries
โ€ข Python notebooks (Jupyter)
โ€ข BI project links (Power BI/Tableau public)

5๏ธโƒฃ Add Descriptions & Tags
โ€ข Use repo tags: sql, python, EDA, dashboard
โ€ข Write short project summary in repo description

๐Ÿง  Tips:
โ€ข Push only clean, working code
โ€ข Use folders, not messy files
โ€ข Update your profile bio with your LinkedIn

๐Ÿ“Œ Practice Task:
Upload your latest project โ†’ Write a README โ†’ Pin it to your profile

๐Ÿ’ฌ Tap โค๏ธ for more!
โค13
๐Ÿšจ Anthropic dropped a FREE 33-page playbook revealing Claude's very own cheat code:

The 'Skills' folder.

Spend 30 minutes building it,
and youโ€™ll never have to explain your process again.

Top-tier users don't just type commands, they build systems.

Grab your free copy of Anthropic's official guide to building Claude skills right here: https://resources.anthropic.com/hubfs/The-Complete-Guide-to-Building-Skill-for-Claude.pdf
โค9
๐Ÿ“ข Advertising in this channel

You can place an ad via Telegaโ€คio. It takes just a few minutes.

Formats and current rates: View details
โœ… Useful Platform to Practice SQL Programming ๐Ÿง ๐Ÿ–ฅ๏ธ

Learning SQL is just the first step โ€” practice is what builds real skill. Here are the best platforms for hands-on SQL:

1๏ธโƒฃ LeetCode โ€“ For Interview-Oriented SQL Practice
โ€ข Focus: Real interview-style problems
โ€ข Levels: Easy to Hard
โ€ข Schema + Sample Data Provided
โ€ข Great for: Data Analyst, Data Engineer, FAANG roles
โœ” Tip: Start with Easy โ†’ filter by โ€œDatabaseโ€ tag
โœ” Popular Section: Database โ†’ Top 50 SQL Questions
Example Problem: โ€œFind duplicate emails in a user tableโ€ โ†’ Practice filtering, GROUP BY, HAVING

2๏ธโƒฃ HackerRank โ€“ Structured & Beginner-Friendly
โ€ข Focus: Step-by-step SQL track
โ€ข Has certification tests (SQL Basic, Intermediate)
โ€ข Problem sets by topic: SELECT, JOINs, Aggregations, etc.
โœ” Tip: Follow the full SQL track
โœ” Bonus: Company-specific challenges
Try: โ€œRevising Aggregations โ€“ The Count Functionโ€ โ†’ Build confidence with small wins

3๏ธโƒฃ Mode Analytics โ€“ Real-World SQL in Business Context
โ€ข Focus: Business intelligence + SQL
โ€ข Uses real-world datasets (e.g., e-commerce, finance)
โ€ข Has an in-browser SQL editor with live data
โœ” Best for: Practicing dashboard-level queries
โœ” Tip: Try the SQL case studies & tutorials

4๏ธโƒฃ StrataScratch โ€“ Interview Questions from Real Companies
โ€ข 500+ problems from companies like Uber, Netflix, Google
โ€ข Split by company, difficulty, and topic
โœ” Best for: Intermediate to advanced level
โœ” Tip: Try โ€œHardโ€ questions after doing 30โ€“50 easy/medium

5๏ธโƒฃ DataLemur โ€“ Short, Practical SQL Problems
โ€ข Crisp and to the point
โ€ข Good UI, fast learning
โ€ข Real interview-style logic
โœ” Use when: You want fast, smart SQL drills

๐Ÿ“Œ How to Practice Effectively:
โ€ข Spend 20โ€“30 mins/day
โ€ข Focus on JOINs, GROUP BY, HAVING, Subqueries
โ€ข Analyze problem โ†’ write โ†’ debug โ†’ re-write
โ€ข After solving, explain your logic out loud

๐Ÿงช Practice Task:
Try solving 5 SQL questions from LeetCode or HackerRank this week. Start with SELECT, WHERE, and GROUP BY.

๐Ÿ’ฌ Tap โค๏ธ for more!
โค11
Here is the list of few projects (found on kaggle). They cover Basics of Python, Advanced Statistics, Supervised Learning (Regression and Classification problems) & Data Science

Please also check the discussions and notebook submissions for different approaches and solution after you tried yourself.

1. Basic python and statistics

Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset

2. Advanced Statistics

Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset

3. Supervised Learning

a) Regression Problems

How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview

b) Classification problems

Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking

4. Some helpful Data science projects for beginners

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/digit-recognizer

https://www.kaggle.com/c/titanic

5. Intermediate Level Data science Projects

Black Friday Data : https://www.kaggle.com/sdolezel/black-friday

Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones

Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset

Million Song Data : https://www.kaggle.com/c/msdchallenge

Census Income Data : https://www.kaggle.com/c/census-income/data

Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset

Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2

Share with credits: https://xn--r1a.website/sqlproject

ENJOY LEARNING ๐Ÿ‘๐Ÿ‘
โค6๐Ÿ‘2
๐Ÿ”น DATA SCIENCE โ€“ INTERVIEW REVISION SHEET

1๏ธโƒฃ What is Data Science?
> โ€œData science is the process of using data, statistics, and machine learning to extract insights and build predictive or decision-making models.โ€

Difference from Data Analytics:
โ€ข Data Analytics โ†’ past  present (what/why)
โ€ข Data Science โ†’ future  automation (what will happen)

2๏ธโƒฃ Data Science Lifecycle (Very Important)
1. Business problem understanding
2. Data collection
3. Data cleaning  preprocessing
4. Exploratory Data Analysis (EDA)
5. Feature engineering
6. Model building
7. Model evaluation
8. Deployment  monitoring
Interview line:
> โ€œI always start from business understanding, not the model.โ€

3๏ธโƒฃ Data Types
โ€ข Structured โ†’ tables, SQL
โ€ข Semi-structured โ†’ JSON, logs
โ€ข Unstructured โ†’ text, images

4๏ธโƒฃ Statistics You MUST Know
โ€ข Central tendency: Mean, Median (use when outliers exist)
โ€ข Spread: Variance, Standard deviation
โ€ข Correlation โ‰  causation
โ€ข Normal distribution
โ€ข Skewness (income โ†’ right skewed)

5๏ธโƒฃ Data Cleaning  Preprocessing
Steps you should say in interviews:
1. Handle missing values
2. Remove duplicates
3. Treat outliers
4. Encode categorical variables
5. Scale numerical data
Scaling:
โ€ข Min-Max โ†’ bounded range
โ€ข Standardization โ†’ normal distribution

6๏ธโƒฃ Feature Engineering (Interview Favorite)
> โ€œFeature engineering is creating meaningful input variables that improve model performance.โ€
Examples:
โ€ข Extract month from date
โ€ข Create customer lifetime value
โ€ข Binning age groups

7๏ธโƒฃ Machine Learning Basics
โ€ข Supervised learning: Regression, Classification
โ€ข Unsupervised learning: Clustering, Dimensionality reduction

8๏ธโƒฃ Common Algorithms (Know WHEN to use)
โ€ข Regression: Linear regression โ†’ continuous output
โ€ข Classification: Logistic regression, Decision tree, Random forest, SVM
โ€ข Unsupervised: K-Means โ†’ segmentation, PCA โ†’ dimensionality reduction

9๏ธโƒฃ Overfitting vs Underfitting
โ€ข Overfitting โ†’ model memorizes training data
โ€ข Underfitting โ†’ model too simple
Fixes:
โ€ข Regularization
โ€ข More data
โ€ข Cross-validation

๐Ÿ”Ÿ Model Evaluation Metrics
โ€ข Classification: Accuracy, Precision, Recall, F1 score, ROC-AUC
โ€ข Regression: MAE, RMSE
Interview line:
> โ€œMetric selection depends on business problem.โ€

1๏ธโƒฃ1๏ธโƒฃ Imbalanced Data Techniques
โ€ข Class weighting
โ€ข Oversampling / undersampling
โ€ข SMOTE
โ€ข Metric preference: Precision, Recall, F1, ROC-AUC

1๏ธโƒฃ2๏ธโƒฃ Python for Data Science
Core libraries:
โ€ข NumPy
โ€ข Pandas
โ€ข Matplotlib / Seaborn
โ€ข Scikit-learn
Must know:
โ€ข loc vs iloc
โ€ข Groupby
โ€ข Vectorization

1๏ธโƒฃ3๏ธโƒฃ Model Deployment (Basic Understanding)
โ€ข Batch prediction
โ€ข Real-time prediction
โ€ข Model monitoring
โ€ข Model drift
Interview line:
> โ€œModels must be monitored because data changes over time.โ€

1๏ธโƒฃ4๏ธโƒฃ Explain Your Project (Template)
> โ€œThe goal was . I cleaned the data using . I performed EDA to identify . I built model and evaluated using . The final outcome was .โ€

1๏ธโƒฃ5๏ธโƒฃ HR-Style Data Science Answers
Why data science?
> โ€œI enjoy solving complex problems using data and building models that automate decisions.โ€
Biggest challenge:
โ€œHandling messy real-world data.โ€
Strength:
โ€œStrong foundation in statistics and ML.โ€

๐Ÿ”ฅ LAST-DAY INTERVIEW TIPS
โ€ข Explain intuition, not math
โ€ข Donโ€™t jump to algorithms immediately
โ€ข Always connect model โ†’ business value
โ€ข Say assumptions clearly

Double Tap โ™ฅ๏ธ For More
โค9๐Ÿ”ฅ1
If I need to teach someone data analytics from the basics, here is my strategy:

1. I will first remove the fear of tools from that person

2. i will start with the excel because it looks familiar and easy to use

3. I put more emphasis on projects like at least 5 to 6 with the excel. because in industry you learn by doing things

4. I will release the person from the tutorial hell and move into a more action oriented person

5. Then I move to the sql because every job wants it , even with the ai tools you need strong understanding for it if you are going to use it daily

6. After strong understanding, I will push the person to solve 100 to 150 Sql problems from basic to advance

7. It helps the person to develop the analytical thinking

8. Then I push the person to solve 3 case studies as it helps how we pull the data in the real life

9. Then I move the person to power bi to do again 5 projects by using either sql or excel files

10. Now the fear is removed.

11. Now I push the person to solve unguided challenges and present them by video recording as it increases the problem solving, communication and data story telling skills

12. Further it helps you to clear case study round given by most of the companies

13. Now i help the person how to present them in resume and also how these tools are used in real world.

14. You know the interesting fact, all of above is present free in youtube and I also mentor the people through existing youtube videos.

15. But people stuck in the tutorial hell, loose motivation , stay confused that they are either in the right direction or not.

16. As a personal mentor , I help them to get of the tutorial hell, set them in the right direction and they stay motivated when they start to see the difference before amd after mentorship

I have curated best 80+ top-notch Data Analytics Resources ๐Ÿ‘‡๐Ÿ‘‡
https://topmate.io/analyst/861634

Hope this helps you ๐Ÿ˜Š
โค9
Real-world Data Science projects ideas: ๐Ÿ’ก๐Ÿ“ˆ

1. Credit Card Fraud Detection

๐Ÿ“ Tools: Python (Pandas, Scikit-learn)

Use a real credit card transactions dataset to detect fraudulent activity using classification models.

Skills you build: Data preprocessing, class imbalance handling, logistic regression, confusion matrix, model evaluation.

2. Predictive Housing Price Model

๐Ÿ“ Tools: Python (Scikit-learn, XGBoost)

Build a regression model to predict house prices based on various features like size, location, and amenities.

Skills you build: Feature engineering, EDA, regression algorithms, RMSE evaluation.


3. Sentiment Analysis on Tweets or Reviews

๐Ÿ“ Tools: Python (NLTK / TextBlob / Hugging Face)

Analyze customer reviews or Twitter data to classify sentiment as positive, negative, or neutral.

Skills you build: Text preprocessing, NLP basics, vectorization (TF-IDF), classification.


4. Stock Price Prediction

๐Ÿ“ Tools: Python (LSTM / Prophet / ARIMA)

Use time series models to predict future stock prices based on historical data.

Skills you build: Time series forecasting, data visualization, recurrent neural networks, trend/seasonality analysis.


5. Image Classification with CNN

๐Ÿ“ Tools: Python (TensorFlow / PyTorch)

Train a Convolutional Neural Network to classify images (e.g., cats vs dogs, handwritten digits).

Skills you build: Deep learning, image preprocessing, CNN layers, model tuning.


6. Customer Segmentation with Clustering

๐Ÿ“ Tools: Python (K-Means, PCA)

Use unsupervised learning to group customers based on purchasing behavior.

Skills you build: Clustering, dimensionality reduction, data visualization, customer profiling.


7. Recommendation System

๐Ÿ“ Tools: Python (Surprise / Scikit-learn / Pandas)

Build a recommender system (e.g., movies, products) using collaborative or content-based filtering.

Skills you build: Similarity metrics, matrix factorization, cold start problem, evaluation (RMSE, MAE).


๐Ÿ‘‰ Pick 2โ€“3 projects aligned with your interests.
๐Ÿ‘‰ Document everything on GitHub, and post about your learnings on LinkedIn.

Here you can find the project datasets: https://whatsapp.com/channel/0029VbAbnvPLSmbeFYNdNA29

React โค๏ธ for more
โค4