ββSegment Anything
The Segment Anything project aims to democratize image segmentation in computer vision, a core task used across various applications such as scientific imagery analysis and photo editing. Traditionally, accurate segmentation models require specialized expertise, AI training infrastructure, and large amounts of annotated data. This project introduces a new task, dataset, and model for image segmentation to overcome these challenges and make segmentation more accessible.
The researchers are releasing the Segment Anything Model (SAM) and the Segment Anything 1-Billion mask dataset (SA-1B), the largest segmentation dataset to date. These resources will enable a wide range of applications and further research into foundational models for computer vision. The SA-1B dataset is available for research purposes, while the SAM is provided under the permissive Apache 2.0 open license. Users can explore the demo to try SAM with their own images.
Paper link: https://arxiv.org/abs/2304.02643
Code link: https://github.com/facebookresearch/segment-anything
Demo link: https://segment-anything.com/demo
Blogpost link: https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/
Dataset link: https://ai.facebook.com/datasets/segment-anything/
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-sam
#deeplearning #cv #pytorch #imagesegmentation #dataset
The Segment Anything project aims to democratize image segmentation in computer vision, a core task used across various applications such as scientific imagery analysis and photo editing. Traditionally, accurate segmentation models require specialized expertise, AI training infrastructure, and large amounts of annotated data. This project introduces a new task, dataset, and model for image segmentation to overcome these challenges and make segmentation more accessible.
The researchers are releasing the Segment Anything Model (SAM) and the Segment Anything 1-Billion mask dataset (SA-1B), the largest segmentation dataset to date. These resources will enable a wide range of applications and further research into foundational models for computer vision. The SA-1B dataset is available for research purposes, while the SAM is provided under the permissive Apache 2.0 open license. Users can explore the demo to try SAM with their own images.
Paper link: https://arxiv.org/abs/2304.02643
Code link: https://github.com/facebookresearch/segment-anything
Demo link: https://segment-anything.com/demo
Blogpost link: https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/
Dataset link: https://ai.facebook.com/datasets/segment-anything/
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-sam
#deeplearning #cv #pytorch #imagesegmentation #dataset
π₯13π5β€1
ββDINOv2: Learning Robust Visual Features without Supervision
Get ready for a game-changer in computer vision! Building on the groundbreaking achievements in natural language processing, foundation models are revolutionizing the way we use images in various systems. By generating all-purpose visual features that excel across diverse image distributions and tasks without finetuning, these models are set to redefine the field.
The researchers behind this work have combined cutting-edge techniques to scale pretraining in terms of data and model size, turbocharging the training process like never before. They've devised an ingenious automatic pipeline to create a rich, diverse, and curated image dataset, setting a new standard in the self-supervised literature. To top it off, they've trained a colossal ViT model with a staggering 1 billion parameters and distilled it into a series of smaller, ultra-efficient models. These models outshine the best available all-purpose features, OpenCLIP, on most benchmarks at both image and pixel levels.
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-dinov2
Project link: https://dinov2.metademolab.com/
#deeplearning #cv #pytorch #imagesegmentation #sota #pretraining
Get ready for a game-changer in computer vision! Building on the groundbreaking achievements in natural language processing, foundation models are revolutionizing the way we use images in various systems. By generating all-purpose visual features that excel across diverse image distributions and tasks without finetuning, these models are set to redefine the field.
The researchers behind this work have combined cutting-edge techniques to scale pretraining in terms of data and model size, turbocharging the training process like never before. They've devised an ingenious automatic pipeline to create a rich, diverse, and curated image dataset, setting a new standard in the self-supervised literature. To top it off, they've trained a colossal ViT model with a staggering 1 billion parameters and distilled it into a series of smaller, ultra-efficient models. These models outshine the best available all-purpose features, OpenCLIP, on most benchmarks at both image and pixel levels.
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-dinov2
Project link: https://dinov2.metademolab.com/
#deeplearning #cv #pytorch #imagesegmentation #sota #pretraining
π₯14π9β€5π€£2π€2
Forwarded from Machinelearning
1. Π ΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²ΠΎ ΠΏΠΎ Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΠΈ ΠΎΡ OpenAI
Π ΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²ΠΎ ΡΠΎΠ΄Π΅ΡΠΆΠΈΡ ΠΏΠΎΠ΄ΡΠΎΠ±Π½ΠΎΠ΅ ΠΎΠΏΠΈΡΠ°Π½ΠΈΠ΅ ΠΏΡΠΎΡΠ΅ΡΡΠ° ΠΏΠ΅ΡΠ΅Π΄Π°ΡΠΈ Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ Π±ΠΎΠ»Π΅Π΅ ΠΊΡΡΠΏΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΊ ΠΊΠΎΠΌΠΏΠ°ΠΊΡΠ½ΠΎΠΉ, c ΡΠΎΡ ΡΠ°Π½Π΅Π½ΠΈΠ΅ΠΌ Π²ΡΡΠΎΠΊΠΎΠΉ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΌΠΎΠ΄Π΅Π»ΠΈ.
ΠΡΠ½ΠΎΠ²Π½ΡΠ΅ Π°ΡΠΏΠ΅ΠΊΡΡ, ΡΠ°ΡΡΠΌΠΎΡΡΠ΅Π½Π½ΡΠ΅ Π² ΡΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²Π΅:
- Π‘ΠΎΡ ΡΠ°Π½Π΅Π½ΠΈΠ΅ Π²ΡΡ ΠΎΠ΄Π½ΡΡ Π΄Π°Π½Π½ΡΡ ΠΊΡΡΠΏΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ: Π‘ΠΎΠ·Π΄Π°Π½ΠΈΠ΅ Π½Π°Π±ΠΎΡΠ° Π΄Π°Π½Π½ΡΡ , ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠ΅Π³ΠΎ ΠΏΡΠ΅Π΄ΡΠΊΠ°Π·Π°Π½ΠΈΡ Π±ΠΎΠ»ΡΡΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ, ΠΊΠΎΡΠΎΡΡΠ΅ Π±ΡΠ΄ΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡΡΡ Π΄Π»Ρ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΠΌΠ΅Π½ΡΡΠ΅ΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.
- ΠΡΠ΅Π½ΠΊΠ° ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ: Π‘ΡΠ°Π²Π½ΠΈΡΠ΅Π»ΡΠ½ΡΠΉ Π°Π½Π°Π»ΠΈΠ· ΡΠΎΡΠ½ΠΎΡΡΠΈ ΠΈ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΡΡΠΈ ΠΊΠ°ΠΊ ΠΊΡΡΠΏΠ½ΠΎΠΉ, ΡΠ°ΠΊ ΠΈ ΠΊΠΎΠΌΠΏΠ°ΠΊΡΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠ°Π·Π»ΠΈΡΠ½ΡΡ ΠΌΠ΅ΡΡΠΈΠΊ.
- Π‘ΠΎΠ·Π΄Π°Π½ΠΈΠ΅ ΠΎΠ±ΡΡΠ°ΡΡΠΈΡ Π΄Π°Π½Π½ΡΡ Π΄Π»Ρ ΠΊΠΎΠΌΠΏΠ°ΠΊΡΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ: ΠΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΏΡΠ΅Π΄ΡΠΊΠ°Π·Π°Π½ΠΈΠΉ ΠΊΡΡΠΏΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π΄Π»Ρ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ ΠΎΠ±ΡΡΠ°ΡΡΠ΅Π³ΠΎ Π½Π°Π±ΠΎΡΠ° Π΄Π°Π½Π½ΡΡ , ΡΠΏΠΎΡΠΎΠ±ΡΡΠ²ΡΡΡΠ΅Π³ΠΎ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠΌΡ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΠΌΠ΅Π½ΡΡΠ΅ΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.
- ΠΡΠ΅Π½ΠΊΠ° Π΄ΠΎΠΎΠ±ΡΡΠ΅Π½Π½ΠΎΠΉ ΠΊΠΎΠΌΠΏΠ°ΠΊΡΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ: ΠΡΠΎΠ²Π΅ΡΠΊΠ° ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΈ ΡΠΎΡΠ½ΠΎΡΡΠΈ ΠΊΠΎΠΌΠΏΠ°ΠΊΡΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΏΠΎΡΠ»Π΅ ΠΏΡΠΎΡΠ΅ΡΡΠ° Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΠΈ Π΄Π»Ρ ΠΏΠΎΠ΄ΡΠ²Π΅ΡΠΆΠ΄Π΅Π½ΠΈΡ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΠΈΡ ΡΡΠ΅Π±ΠΎΠ²Π°Π½ΠΈΡΠΌ.
2. Π£ΡΠ΅Π±Π½ΠΈΠΊ ΠΏΠΎ Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΠΈ Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ PyTorch
Π ΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²ΠΎ ΠΎΡ PyTorch, ΠΊΠΎΡΠΎΡΠΎΠ΅ ΡΠΎΠ΄Π΅ΡΠΆΠΈΡ ΠΏΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠ΅ Π²Π²Π΅Π΄Π΅Π½ΠΈΠ΅ Π² ΡΠ΅Ρ Π½ΠΈΠΊΡ ΠΏΠ΅ΡΠ΅Π΄Π°ΡΠΈ Π·Π½Π°Π½ΠΈΠΉ Π΄Π»Ρ ΡΠ°Π·Π²ΡΡΡΡΠ²Π°Π½ΠΈΡ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π½Π° ΡΡΡΡΠΎΠΉΡΡΠ²Π°Ρ Ρ ΠΎΠ³ΡΠ°Π½ΠΈΡΠ΅Π½Π½ΡΠΌΠΈ Π²ΡΡΠΈΡΠ»ΠΈΡΠ΅Π»ΡΠ½ΡΠΌΠΈ ΡΠ΅ΡΡΡΡΠ°ΠΌΠΈ.
ΠΡΠ½ΠΎΠ²Π½ΡΠ΅ Π°ΡΠΏΠ΅ΠΊΡΡ ΡΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²Π°:
- ΠΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΠ΅ ΡΠΊΡΡΡΡΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΠΉ: Π Π³Π°ΠΉΠ΄Π΅ ΠΏΠΎΠΊΠ°Π·Π°Π½ΠΎ, ΠΊΠ°ΠΊ ΠΏΠΎΠ»ΡΡΠΈΡΡ ΠΏΡΠΎΠΌΠ΅ΠΆΡΡΠΎΡΠ½ΡΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΡ ΠΈΠ· ΠΎΠ±ΡΡΠ΅Π½Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π΄Π»Ρ Π΄Π°Π»ΡΠ½Π΅ΠΉΡΠ΅Π³ΠΎ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ.
- ΠΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΡΠΈΠΊΠ»ΠΎΠ² ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Π² PyTorch: ΠΠ΄Π΅ΡΡ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΡΡΡ ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡΠ΅Π»ΡΠ½ΡΡ ΡΡΠ½ΠΊΡΠΈΠΉ Π² ΡΡΠ°Π½Π΄Π°ΡΡΠ½ΡΠ΅ ΡΠΈΠΊΠ»Ρ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Π΄Π»Ρ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠΉ ΠΏΠ΅ΡΠ΅Π΄Π°ΡΠΈ Π·Π½Π°Π½ΠΈΠΉ.
- ΠΠ° ΠΏΡΠΈΠΌΠ΅ΡΠ΅ ΠΏΠΎΠΊΠ°Π·Π°Π½ ΠΏΡΠΎΡΠ΅ΡΡ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΠΊΠΎΠΌΠΏΠ°ΠΊΡΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ, Ρ ΠΈΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΏΡΠ΅Π΄ΡΠΊΠ°Π·Π°Π½ΠΈΡ Π±ΠΎΠ»Π΅Π΅ ΡΠ»ΠΎΠΆΠ½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π² ΠΊΠ°ΡΠ΅ΡΡΠ²Π΅ ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠ°.
Π ΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²ΠΎ ΡΠΎΠ΄Π΅ΡΠΆΠΈΡ ΠΏΠΎΡΠ°Π³ΠΎΠ²ΡΠ΅ ΠΈΠ½ΡΡΡΡΠΊΡΠΈΠΈ ΠΈ ΠΏΡΠΈΠΌΠ΅ΡΡ ΠΊΠΎΠ΄Π°, ΡΡΠΎ Π΄Π΅Π»Π°Π΅Ρ Π΅Π³ΠΎ ΡΠ΅Π½Π½ΡΠΌ ΡΠ΅ΡΡΡΡΠΎΠΌ, Π΅ΡΠ»ΠΈ Π²Ρ Ρ ΠΎΡΠΈΡΠ΅ Π½Π°ΡΡΠΈΡΡΡΡ ΠΎΠΏΡΠΈΠΌΠΈΠ·ΠΈΡΠΎΠ²Π°ΡΡ ΡΠ²ΠΎΠΈ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π΄Π»Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ Π² ΡΡΠ΅Π΄Π°Ρ Ρ ΠΎΠ³ΡΠ°Π½ΠΈΡΠ΅Π½Π½ΡΠΌΠΈ ΡΠ΅ΡΡΡΡΠ°ΠΌΠΈ.
βͺΠ‘ΡΡΠ»ΠΊΠ°
3. Jetson Introduction to Knowledge Distillation ΠΎΡ Nvidia
Π Π΄Π°Π½Π½ΠΎΠΌ ΡΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²Π΅ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΡΡΡ ΠΏΡΠΎΡΠ΅ΡΡ ΠΏΠ΅ΡΠ΅Π΄Π°ΡΠΈ Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ ΠΌΠΎΠ΄Π΅Π»ΠΈ OpenCLIP (vision-language model) ΠΊ ΠΌΠΎΠ΄Π΅Π»ΠΈ ResNet18 Π΄Π»Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ Π½Π° Π½Π°Π±ΠΎΡΠ΅ Π΄Π°Π½Π½ΡΡ STL10.
ΠΡΠΎΠ±ΠΎΠ΅ Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅ ΡΠ΄Π΅Π»ΡΠ΅ΡΡΡ ΡΠΎΠΌΡ, ΠΊΠ°ΠΊ Π²ΡΠ±ΠΎΡ Π΄Π°Π½Π½ΡΡ , ΠΌΠ΅ΡΠΎΠ΄Ρ Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΠΈ ΠΈ Π°ΡΡ ΠΈΡΠ΅ΠΊΡΡΡΠ° ΠΌΠΎΠ΄Π΅Π»ΠΈ, Π²Π»ΠΈΡΡΡ Π½Π° ΠΈΡΠΎΠ³ΠΎΠ²ΡΡ ΡΠΎΡΠ½ΠΎΡΡΡ.
ΠΡΠΎΠΌΠ΅ ΡΠΎΠ³ΠΎ, ΠΎΠ±ΡΡΠΆΠ΄Π°ΡΡΡΡ ΠΌΠ΅ΡΠΎΠ΄Ρ ΠΏΡΠΎΡΠΈΠ»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΈ ΠΎΠΏΡΠΈΠΌΠΈΠ·Π°ΡΠΈΠΈ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π΄Π»Ρ ΠΈΡ ΡΠ°Π·Π²ΡΡΡΡΠ²Π°Π½ΠΈΡ Π½Π° ΡΡΡΡΠΎΠΉΡΡΠ²Π°Ρ NVIDIA Jetson Orin Nano.
4. Π£ΡΠ΅Π±Π½ΠΈΠΊ ΠΏΠΎ Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΠΈ Π·Π½Π°Π½ΠΈΠΉ ΠΎΡ Keras
ΠΠΎΠ΄ΡΠΎΠ±Π½ΠΎ ΠΎΠΏΠΈΡΡΠ²Π°Π΅ΡΡΡ ΠΊΠΎΠ½ΡΠ΅ΠΏΡΠΈΡ Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΠΈ Π·Π½Π°Π½ΠΈΠΉ ΠΈ Π΅Π΅ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ΅ ΠΌΠ΅Π΄ΠΈΡΠΈΠ½ΡΠΊΠΈΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ.
5. Π ΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²ΠΎ ΠΏΠΎ Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΠΈ ΠΎΡ
huggingface π€
ΠΠ΄Π΅ΡΡ ΠΏΠΎΠΊΠ°Π·Π°Π½ΠΎ, ΠΊΠ°ΠΊ Π²ΡΠΏΠΎΠ»Π½ΡΡΡ Π΄ΠΈΡΡΠΈΠ»Π»ΡΡΠΈΡ Π·Π½Π°Π½ΠΈΠΉ ΡΠ°Π³ Π·Π° ΡΠ°Π³ΠΎΠΌ Π½Π° ΠΊΠΎΠ½ΠΊΡΠ΅ΡΠ½ΠΎΠΌ ΠΏΡΠΈΠΌΠ΅ΡΠ΅.
6. ΠΠΈΡΡΠΈΠ»Π»ΡΡΠΈΡ Π·Π½Π°Π½ΠΈΠΉ Π΄Π»Ρ Π·Π°Π΄Π°Ρ ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΠΎΠ³ΠΎ Π·ΡΠ΅Π½ΠΈΡ ΠΎΡ huggingface
ΠΠ΄Π΅ΡΡ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΡΡΡ, ΠΊΠ°ΠΊ ΡΠ΄Π΅Π»Π°ΡΡ ΡΠ°ΠΉΠ½ΡΡΠ½ ViT-ΠΌΠΎΠ΄Π΅Π»ΠΈ Π² MobileNet Ρ ΠΏΠΎΠΌΠΎΡΡΡ API Trainer ΠΈΠ· Transformers.
#KnowledgeDistillation #Distillation #openai #keras #tutorial #course #freecourses #huggingface #Nvidia #pytorch
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π₯7β€3β1π1