At the moment Kubernetes is one of the most exciting technologies in the world of DevOps. Recently a lot of hype has formed around it for one simple reason, and this reason is the mighty containers.
https://luminousmen.com/post/kubernetes-101
https://luminousmen.com/post/kubernetes-101
Blog | iamluminousmen
Kubernetes 101
Discover the power of Kubernetes in modern DevOps! Unleash the potential of containerized applications with Kubernetes' robust orchestration capabilities. Dive into the world of Kubernetes 101 and revolutionize your infrastructure conversations.
It[AI] wears a cloak of black velvet; and a cowl, covering its face. It sits, on a throne made of dry bones of the dead, at the center of a large hall...
— GPT-3
A small fantastic story in which part of the text and the dialogue at the end are written by neuron GPT-3. At the edges there is communication between the author and the network, for me it seems the most interesting part.
https://jamesyu.org/singular/
#ds
— GPT-3
A small fantastic story in which part of the text and the dialogue at the end are written by neuron GPT-3. At the edges there is communication between the author and the network, for me it seems the most interesting part.
https://jamesyu.org/singular/
#ds
jamesyu.org
Singular: Possible Futures of the Singularity in Conversation with GPT-3
Short stories about the singularity written in collaboration with GPT-3.
Researchers from Google Research offered a whole family of scalable and efficient classifiers called EfficientDet.
Tests have shown that the new technology is able to show accuracy commensurate with its predecessors, while being 9 times smaller and using less computing power.
https://ai.googleblog.com/2020/04/efficientdet-towards-scalable-and.html (https://ai.googleblog.com/2020/04/efficientdet-towards-scalable-and.html)
#ml #ds
Tests have shown that the new technology is able to show accuracy commensurate with its predecessors, while being 9 times smaller and using less computing power.
https://ai.googleblog.com/2020/04/efficientdet-towards-scalable-and.html (https://ai.googleblog.com/2020/04/efficientdet-towards-scalable-and.html)
#ml #ds
blog.research.google
EfficientDet: Towards Scalable and Efficient Object Detection
Facebook developed an online solution called TransCoder, whose main task is to translate code from one language to another using deep learning. Now the solution can successfully translate functions between C++, Python 3 and Java.
Now it's easy to move to Python from Java ;)
https://ai.facebook.com/blog/deep-learning-to-translate-between-programming-languages/
#python #ml
Now it's easy to move to Python from Java ;)
https://ai.facebook.com/blog/deep-learning-to-translate-between-programming-languages/
#python #ml
Meta
Deep learning to translate between programming languages
TransCoder is the first self-supervised neural transcompiler system for migrating code between programming languages. It can translate code from Python to C++, for example, and it outperforms rule-based translation programs.
The researchers fed raw data from the Kepler telescope (retired) to a ML model that had previously been trained to recognize exoplanets - and received potentially 50 new exoplanets that were previously unknown.
https://www.cnn.com/2020/08/26/tech/ai-new-planets-confirmed-intl-hnk-scli-scn/index.html?tg.
#ml
https://www.cnn.com/2020/08/26/tech/ai-new-planets-confirmed-intl-hnk-scli-scn/index.html?tg.
#ml
Interesting story from the ex-googler about GCP lack of support. Many f words, which is understandable for me - I don’t like the platform myself.
https://medium.com/@steve.yegge/dear-google-cloud-your-deprecation-policy-is-killing-you-ee7525dc05dc
https://medium.com/@steve.yegge/dear-google-cloud-your-deprecation-policy-is-killing-you-ee7525dc05dc
Medium
Dear Google Cloud: Your Deprecation Policy is Killing You
God dammit, I didn’t want to blog again. I have so much stuff to do. Blogging takes time and energy and creativity that I could be putting…
NVIDIA introduced its PyTorch-based framework, which is designed to help integrate deep learning solutions into healthcare imaging projects.
https://github.com/Project-MONAI/MONAI
#ml
https://github.com/Project-MONAI/MONAI
#ml
GitHub
GitHub - Project-MONAI/MONAI: AI Toolkit for Healthcare Imaging
AI Toolkit for Healthcare Imaging. Contribute to Project-MONAI/MONAI development by creating an account on GitHub.
A large review on how TickTock uses machine learning to increase user engagement and pierce the "filter bubble". Nothing really fancy just interesting to read the problems they trying to solve.
https://www.axios.com/inside-tiktoks-killer-algorithm-52454fb2-6bab-405d-a407-31954ac1cf16.html
https://www.axios.com/inside-tiktoks-killer-algorithm-52454fb2-6bab-405d-a407-31954ac1cf16.html
Axios
TikTok reveals details of how its algorithm works
The beleaguered app describes the inner workings of its video-selection code.
AWS has implemented the linux-based operating system Bottlerocket. It is an open-source project, developed by AWS as a basic host to run containers. The general idea is that nowadays, in most cases, general-purpose operating systems are used to start containers, which does not contribute to the security and capability of an atomic update.
https://aws.amazon.com/blogs/opensource/announcing-the-general-availability-of-bottlerocket-an-open-source-linux-distribution-purpose-built-to-run-containers/
https://aws.amazon.com/blogs/opensource/announcing-the-general-availability-of-bottlerocket-an-open-source-linux-distribution-purpose-built-to-run-containers/
GitHub
GitHub - bottlerocket-os/bottlerocket: An operating system designed for hosting containers
An operating system designed for hosting containers - bottlerocket-os/bottlerocket
I've been recommending Pi-hole not long ago but RCE exploit was discovered in the Pi-hole software. This particular problem requires authenticated access to the Pi-hole administrative web interface, so it’s not likely to cause too many problems on its own but anyway.
https://frichetten.com/blog/cve-2020-11108-pihole-rce/
#privacy
https://frichetten.com/blog/cve-2020-11108-pihole-rce/
#privacy
Telegram
L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
Who wants to remove ads and tracking without any extensions, check out Pi-Hole on Raspberry Pi Zero.
https://youtu.be/KBXTnrD_Zs4
#privacy
https://youtu.be/KBXTnrD_Zs4
#privacy
Famous in-memory data format
Apache Arrow is a sacred grail of analytics that was invented not so long ago. It is a special format for column data storage in memory. It allows you to copy objects from one process to another very quickly - from pandas to PyTorch, from pandas to TensorFlow, from Cuda to PyTorch, from one node to another node, etc.. This makes it the horse of a large number of frameworks for both analytics and big data.
I actually don't know any other in-memory format with complex data, dynamic schemas, performance, and platform support.
Apache Arrow itself is not a storage or execution engine. It is designed to serve as a foundation for the following types of systems:
- SQL execution engines (Drill, Impala etc)
- Data analysis systems (Pandas, Spark etc)
- Streaming and queueing systems (Kafka, Storm etc)
- Storage systems (Parquet, Kudu, Cassandra etc)
- Machine Learning libraries(TensorFlow, Petastorm, Rapids etc)
Please do not think that this is part of Parquet format or part of PySpark. This is a separate self-contained format which I think is a bit undervalued and should be taught with all other big data formats.
https://arrow.apache.org/overview/
#big_data
Apache Arrow is a sacred grail of analytics that was invented not so long ago. It is a special format for column data storage in memory. It allows you to copy objects from one process to another very quickly - from pandas to PyTorch, from pandas to TensorFlow, from Cuda to PyTorch, from one node to another node, etc.. This makes it the horse of a large number of frameworks for both analytics and big data.
I actually don't know any other in-memory format with complex data, dynamic schemas, performance, and platform support.
Apache Arrow itself is not a storage or execution engine. It is designed to serve as a foundation for the following types of systems:
- SQL execution engines (Drill, Impala etc)
- Data analysis systems (Pandas, Spark etc)
- Streaming and queueing systems (Kafka, Storm etc)
- Storage systems (Parquet, Kudu, Cassandra etc)
- Machine Learning libraries(TensorFlow, Petastorm, Rapids etc)
Please do not think that this is part of Parquet format or part of PySpark. This is a separate self-contained format which I think is a bit undervalued and should be taught with all other big data formats.
https://arrow.apache.org/overview/
#big_data
Apache Arrow
Format
Arrow Format
Where do I start to learn AWS?
So, if you go to the AWS Documentation you will see an endless list of services, but it's just the global table of contents of global tables of contents! That's right — Amazon is huge right now. At the time of writing these lines are two and a half hundred services under the hood. It is not realistic to learn them all, and there is no reason to do it at all.
John Markoff says “The Internet is entering its Lego era.” AWS services is similar to Lego — you finding the right pieces and combine them together. In order to highlight the most essential pieces it is reasonable to say that they were historically the first. They are:
- S3 — storage
- EC2 — virtual machines + EBS drives
- RDS — databases
- Route53 — DNS
- VPC — network
- ELB — load balancers
- CloudFront — CDN
- SQS/SNS — messages
- IAM — main access rights to everything
- CloudWatch — logs/metrics
Then there are modern serverless pieces (Lambda, DynamoDB, API Gateway, CloudFront, IAM, SNS, SQS, Step Functions, EventBridge).
#aws
So, if you go to the AWS Documentation you will see an endless list of services, but it's just the global table of contents of global tables of contents! That's right — Amazon is huge right now. At the time of writing these lines are two and a half hundred services under the hood. It is not realistic to learn them all, and there is no reason to do it at all.
John Markoff says “The Internet is entering its Lego era.” AWS services is similar to Lego — you finding the right pieces and combine them together. In order to highlight the most essential pieces it is reasonable to say that they were historically the first. They are:
- S3 — storage
- EC2 — virtual machines + EBS drives
- RDS — databases
- Route53 — DNS
- VPC — network
- ELB — load balancers
- CloudFront — CDN
- SQS/SNS — messages
- IAM — main access rights to everything
- CloudWatch — logs/metrics
Then there are modern serverless pieces (Lambda, DynamoDB, API Gateway, CloudFront, IAM, SNS, SQS, Step Functions, EventBridge).
#aws
Rapids
Nvidia has been developing an open source platform Rapids, whose task is to accelerate the work of data processing and machine learning algorithms on the GPU. Developers on Rapids don't have to use different libraries: they just write code in Python, and Rapids automatically optimizes it to run on the GPU. All data is stored in the Apache Arrow format in-memory.
I already wrote about GPU vs CPU. But the problem is that the amount of memory using the CPU we have now is limited to terabytes, and the GPU has a maximum of 50 GB of memory. Here Dask comes to the rescue - integration with Dask gives Rapids GPU clusters with multi GPU support.
The Rapids repository has the cuDF library for data preparation and neural network training, and the cuML library allows to develop machine learning algorithms without going into the details of CUDA programming.
Sounds cool, doesn't it? But, there is always but:
- it's still not production ready
- porting any complex udf is very hard (at least you should know cuda, which I don't)
- no cpu libraries version for inference
- no automatic memory management
- it's nvidia only
https://github.com/rapidsai
#ml
Nvidia has been developing an open source platform Rapids, whose task is to accelerate the work of data processing and machine learning algorithms on the GPU. Developers on Rapids don't have to use different libraries: they just write code in Python, and Rapids automatically optimizes it to run on the GPU. All data is stored in the Apache Arrow format in-memory.
I already wrote about GPU vs CPU. But the problem is that the amount of memory using the CPU we have now is limited to terabytes, and the GPU has a maximum of 50 GB of memory. Here Dask comes to the rescue - integration with Dask gives Rapids GPU clusters with multi GPU support.
The Rapids repository has the cuDF library for data preparation and neural network training, and the cuML library allows to develop machine learning algorithms without going into the details of CUDA programming.
Sounds cool, doesn't it? But, there is always but:
- it's still not production ready
- porting any complex udf is very hard (at least you should know cuda, which I don't)
- no cpu libraries version for inference
- no automatic memory management
- it's nvidia only
https://github.com/rapidsai
#ml
Telegram
L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
CPU vs GPU
https://youtu.be/-P28LKWTzrI
https://youtu.be/-P28LKWTzrI