Give me 10 seconds to explain what ML engineers are doing
A very cool report on why and how everything (not) works, and what to do with it:
https://youtu.be/xA5U85LSk0M
#dev
https://youtu.be/xA5U85LSk0M
#dev
YouTube
How Your Systems Keep Running Day After Day - John Allspaw
How Your Systems Keep Running Day After Day: Resilience Engineering as DevOps
John Allspaw, CTO/Researcher, Adaptive Capacity Labs
DOES17 San Francisco
DevOps Enterprise Summit
https://events.itrevolution.com/us/
My goal today is twofold. One, I'm intending…
John Allspaw, CTO/Researcher, Adaptive Capacity Labs
DOES17 San Francisco
DevOps Enterprise Summit
https://events.itrevolution.com/us/
My goal today is twofold. One, I'm intending…
The UK fucked up the the statistics on covid-19 from September 25 to October 2, because they stored data on daily cases in Excel spreadsheet, each case was put in a separate column, and at some point it ran out of columns. Because of that, the data were not loaded into the official dashboard in real time, and the British found out about the huge increase in infestations when it was too late.
P.S. As a solution, they think of keeping several Excel tablets. Lol
https://www.dailymail.co.uk/news/article-8805697/Furious-blame-game-16-000-Covid-cases-missed-Excel-glitch.html.
P.S. As a solution, they think of keeping several Excel tablets. Lol
https://www.dailymail.co.uk/news/article-8805697/Furious-blame-game-16-000-Covid-cases-missed-Excel-glitch.html.
Mail Online
Blame game after 16,000 Covid cases missed due to Excel glitch
A clearer picture of the country's outbreak has emerged after some 16,000 confirmed infections had to be added to the daily totals running back more than a week.
Cool writeup on zero and few-shot learning techniques in NLP. It's hard to create a model that will perform well on the unseen data wIthout fine-tune it, author here comprehensively explaining the methods and giving examples of how this problem can be solved.
https://joeddav.github.io/blog/2020/05/29/ZSL.html
#ds
https://joeddav.github.io/blog/2020/05/29/ZSL.html
#ds
Joe Davison Blog
Zero-Shot Learning in Modern NLP
State-of-the-art NLP models for text classification without annotated data
Technical debt
Any more or less experienced engineer has more than once encountered a situation where it is necessary to do "dirty" work. For example, write intentionally not scalable code, not decomposing, intentionally not write tests, manual deployment, hardcore configurations. Because it is "temporary". Because "now we have to give something, we will fix it later". Cut corners, code smells, undocumented changes - all this is accumulated over time. And it is very difficult to solve, and it is easier to start with a clean slate.
There is no escape from this, and a small amount of technical debt will haunt you until you retire. To somehow live with it, you can apply the same "golden" rule 80/20. If you work in a project group, allocate 80% of resources to the project and 20 to "pay" the debt. You are the creditor and payer here, and if you miss a couple of "payments" (and it does not matter for what reasons), you will get into bondage, when, on the contrary, 80% of the time is spent on solving technical debt and only 20% on the project.
A useful article about the causes, consequences, and ways to avoid technical "debt": https://www.extremeuncertainty.com/technical-debt-technical-bankruptcy
#dev
Any more or less experienced engineer has more than once encountered a situation where it is necessary to do "dirty" work. For example, write intentionally not scalable code, not decomposing, intentionally not write tests, manual deployment, hardcore configurations. Because it is "temporary". Because "now we have to give something, we will fix it later". Cut corners, code smells, undocumented changes - all this is accumulated over time. And it is very difficult to solve, and it is easier to start with a clean slate.
There is no escape from this, and a small amount of technical debt will haunt you until you retire. To somehow live with it, you can apply the same "golden" rule 80/20. If you work in a project group, allocate 80% of resources to the project and 20 to "pay" the debt. You are the creditor and payer here, and if you miss a couple of "payments" (and it does not matter for what reasons), you will get into bondage, when, on the contrary, 80% of the time is spent on solving technical debt and only 20% on the project.
A useful article about the causes, consequences, and ways to avoid technical "debt": https://www.extremeuncertainty.com/technical-debt-technical-bankruptcy
#dev
Extremeuncertainty
Technical debt – or technical bankruptcy? | Extreme Uncertainty
It's time we admit we have a problem with technical debt. Everyone knows what it is, everyone is talking about, but not enough is being done about it. Time and again I have seen teams and systems end up swamped in technical debt. Swimming and eventually drowning…
Privacy is an emerging topic in the Machine Learning community. There aren’t canonical guidelines to produce a private model. There is a growing body of research showing that a machine learning model can leak sensitive information of the training dataset, thus creating a privacy risk for users in the training set.
Cost-efficient “membership inference attacks” predict whether a specific piece of data was used during training. If an attacker is able to make a prediction with high accuracy, they will likely succeed in figuring out if a data piece was used in the training set. The biggest advantage of a membership inference attack is that it is easy to perform, i.e., does not require any re-training.
A few years ago, cornell researches did some investigation around the privacy properties of machine learning models. Interesting to read:
https://www.cs.cornell.edu/~shmat/shmat_oak17.pdf
Cost-efficient “membership inference attacks” predict whether a specific piece of data was used during training. If an attacker is able to make a prediction with high accuracy, they will likely succeed in figuring out if a data piece was used in the training set. The biggest advantage of a membership inference attack is that it is easy to perform, i.e., does not require any re-training.
A few years ago, cornell researches did some investigation around the privacy properties of machine learning models. Interesting to read:
https://www.cs.cornell.edu/~shmat/shmat_oak17.pdf
Definitely panic if there's caviar
I ran into a Valve handbook for new employees and I was a little caught up in reading. Sometimes I wanted to steal pieces of text so well written.
I advise you to glance at it, it's very interesting.
#stuff
I ran into a Valve handbook for new employees and I was a little caught up in reading. Sometimes I wanted to steal pieces of text so well written.
I advise you to glance at it, it's very interesting.
We usually don't do any formalized employee "development" (course work, mentor assign-ment), because for senior people it's mostly not effective. We believe that high-performance people are generally self-improving.https://steamcdn-a.akamaihd.net/apps/valve/Valve_NewEmployeeHandbook.pdf
#stuff
Daily Standup
Scrum (one of the Agile frameworks) has one such procedure, called "Daily Scrum" (or Daily Standup). It is a simple team meeting where everyone talks about yesterday's achievements and what they will be doing today. It's a simple meeting that should synchronize everyone with each other, and where you can raise critical questions.
The name "standup" comes from the fact that at this meeting everyone, as you guessed, is standing, making the meeting go faster. Apparently, people are tired standing up, want to finish as soon as possible and run away, but when you sit at the meeting, you are comfortable, and a lot of time is spent on chitchat.
False. Yes, people get tired, but they don't start giving out information in a concise manner. And if you have 10 people (it is believed that everyone speaks no longer than a minute or two), the last one in the list will be so tired that it will give a few words, in order to sit his ass on a comfortable, soft chair. As a result, we have a meeting that everyone hates and does not want to go to.
Scrum requires some discipline and self-control, it is perfect for small teams with new product, whose future is not clear. In its turn, it is completely unsuitable for large monolithic projects and engineering teams, especially reactive ones, working on incoming tasks.
There are a lot of disputes about right and wrongly prepared Scrum, about the importance of the role of Scrum Master and who should be given this role, but I have not seen any discussions on morning standups although I often meet those who do not like them.
For those who missed the article on the scrum - https://luminousmen.com/post/11-steps-of-scrum
Scrum (one of the Agile frameworks) has one such procedure, called "Daily Scrum" (or Daily Standup). It is a simple team meeting where everyone talks about yesterday's achievements and what they will be doing today. It's a simple meeting that should synchronize everyone with each other, and where you can raise critical questions.
The name "standup" comes from the fact that at this meeting everyone, as you guessed, is standing, making the meeting go faster. Apparently, people are tired standing up, want to finish as soon as possible and run away, but when you sit at the meeting, you are comfortable, and a lot of time is spent on chitchat.
False. Yes, people get tired, but they don't start giving out information in a concise manner. And if you have 10 people (it is believed that everyone speaks no longer than a minute or two), the last one in the list will be so tired that it will give a few words, in order to sit his ass on a comfortable, soft chair. As a result, we have a meeting that everyone hates and does not want to go to.
Scrum requires some discipline and self-control, it is perfect for small teams with new product, whose future is not clear. In its turn, it is completely unsuitable for large monolithic projects and engineering teams, especially reactive ones, working on incoming tasks.
There are a lot of disputes about right and wrongly prepared Scrum, about the importance of the role of Scrum Master and who should be given this role, but I have not seen any discussions on morning standups although I often meet those who do not like them.
For those who missed the article on the scrum - https://luminousmen.com/post/11-steps-of-scrum
David Beazley recently launched Practical Python Programming, a course on Python that he created and taught for 13 years. Definitely recommended for newbies 👌
#python
#python
practical-python
Welcome!
Practical Python Programming (course by @dabeaz)
DeepMind shared their new curated list of learning resources for many different areas of DS, ML and AI
https://storage.googleapis.com/deepmind-media/research/New_AtHomeWithAI%20resources.pdf
#ds #ml
https://storage.googleapis.com/deepmind-media/research/New_AtHomeWithAI%20resources.pdf
#ds #ml
Can’t say better than google about MLOps
https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
#ml
https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
#ml
Google Cloud
MLOps: Continuous delivery and automation pipelines in machine learning | Cloud Architecture Center | Google Cloud
Discusses techniques for implementing and automating continuous integration (CI), continuous delivery (CD), and continuous training (CT) for machine learning (ML) systems.
Backpressure
This is the concept I omitted completely in my book on asynchronous programming.
Backpressure is when the progress of turning that input to output is resisted in some way. In most cases that resistance is computational speed — trouble computing the output as fast as the input comes in — so that’s by far the easiest way to look at it. But other forms of backpressure can happen too: for example, if your software has to wait for the user to take some action.
More on this cool talk:
https://youtu.be/I6eZ4ZyI1Zg.
#dev
This is the concept I omitted completely in my book on asynchronous programming.
Backpressure is when the progress of turning that input to output is resisted in some way. In most cases that resistance is computational speed — trouble computing the output as fast as the input comes in — so that’s by far the easiest way to look at it. But other forms of backpressure can happen too: for example, if your software has to wait for the user to take some action.
More on this cool talk:
https://youtu.be/I6eZ4ZyI1Zg.
#dev
YouTube
ReactiveConf 2019 - Jay Phelps: Backpressure: Resistance is NOT Futile
Oct 30 - Nov 1, 2019Prague, Czech Republichttps://reactiveconf.com/Discovery stage-------------------------------------------------------------------Reactive...
Hey everyone!
Please leave your shoes at the door and make yourselves comfy on the couch. Tea, coffee, or maybe a beer. All settled? Great, let’s gets started!
So, things are changing a bit in my world. I’m updating how I communicate on Telegram and my blog. Expect more personal opinions and thematic content (don’t worry, the memes are staying). Plus, I’m adding more interactive elements so I can have some feedback. There will be shorter but more frequent blog posts, and longer, more thoughtful ones on Telegram.
I’ll also organize everything with tags for easier browsing. Here’s what I’ll focus on:
- #python: Python development
- #big_data: All things big data
- #ml: Machine learning and related topics
- #soft_skills: Soft skills, leadership, and more
- #dev: All types of things related to software development
I haven’t really promoted this channel before, just quietly sharing interesting to me stuff. Now, I want it to grow and evolve with you all.
Hope you’re excited about the changes!
Please leave your shoes at the door and make yourselves comfy on the couch. Tea, coffee, or maybe a beer. All settled? Great, let’s gets started!
So, things are changing a bit in my world. I’m updating how I communicate on Telegram and my blog. Expect more personal opinions and thematic content (don’t worry, the memes are staying). Plus, I’m adding more interactive elements so I can have some feedback. There will be shorter but more frequent blog posts, and longer, more thoughtful ones on Telegram.
I’ll also organize everything with tags for easier browsing. Here’s what I’ll focus on:
- #python: Python development
- #big_data: All things big data
- #ml: Machine learning and related topics
- #soft_skills: Soft skills, leadership, and more
- #dev: All types of things related to software development
I haven’t really promoted this channel before, just quietly sharing interesting to me stuff. Now, I want it to grow and evolve with you all.
Hope you’re excited about the changes!
On the internet no one knows you’re a cat. Another view on the topic of personal privacy.
https://luminousmen.com/post/clean-up-your-digital-hygiene
#privacy
https://luminousmen.com/post/clean-up-your-digital-hygiene
#privacy
Blog | iamluminousmen
Clean up your digital hygiene
Since childhood, we know that when we come from the street, we have to wash our hands. However, we do not really think about what to do after surfing online
Remote AWS Certification Exam
Not so long ago I passed AWS certification for Solution Architect Associate and want to share my experience.
You can take the exam yourself either at a certified center or at home online (PearsonVUE app). Due to the fact that no other certification center close to me worked on weekends, I decided to take the exam at home.
There are detailed requirements and guidelines for taking the exam. In general, everything is quite reasonable. You need a laptop or PC with Internet connection and a web camera. Test the connection and speed in advance. There should not be any notes, gadgets or any other turned on screens (even headphones cannot be used!) near you for the exam. If possible, the windows should be shut off. No one should enter the room where the exam is taken for the duration of the exam, the door must be closed. During the exam, you should look at the monitor so that there are no complaints from the examinator. You will not be able to go to the toilet or have a snack during the exam.
The exam takes 140 minutes and consists of 65 questions. Most often you need to pick one option out of four, although there are two options out of four or two options out of six. The questions are mostly very large and describe a typical scenario for choosing the right solutions from the AWS world. A passing score of 72%.
To pass the test you will need to install a special app that allows the examiner to monitor the screen, camera, and sound when taking the test. I advise you to do this in advance. All this information is available before the test at pearsonvue.com.
Frankly, I want to say that I could not start the exam the first time — the examiner (in terms of pearsonvue is called a proctor) for the first time did not appear after I waited for about an hour. Judging by the feedback on Reddit, this happens quite often, so be prepared for it. After I wrote in support I was allowed to reschedule the exam.
So on the day of the exam, 15 minutes before the appointed time, I opened the Peasonvue application and started to fill in the necessary fields. To confirm your identity, you must take a photo of your driver's license or passport. What is interesting, you can take a photo either on the phone or on the web. More for the sake of interest, I chose the option to take pictures with a camera on the phone. After a couple of seconds, I received a link in SMS. Following the prompts, I took the picture of my license and then the pictures of the room on the four sides. After the final confirmation on the phone, a couple of seconds later the screen on the laptop changed, saying that everything is ready for the test.
After about five minutes the examiner wrote to me in a chat and then called me. Before the beginning, I was asked to remove the documents from the table because there should be nothing left on the table and then asked to spin a laptop camera. I suppose — to make sure that everything in the room corresponds to the photos received earlier. I received a wish of luck and the exam started.
The interface with questions was unusual at first, but then I got involved in the process and no longer paid attention to the appearance. I'm a little screwed up with the timing - too much time reading and thinking about the first half of the questions, the second time was completely cut off, sometimes I did not even finish the question to make it to the end.
In the end: after more than two hours of intense thinking, you could finally relax. The examiner did not connect to me anymore - the exam ended by itself. In the end, you can see the most important thing - have you passed or have not passed the exam. This happens automatically. But the number of points comes in a couple of days.
A couple of days later I received a nice letter "Congratulations, You are Now AWS Certified".
#aws
Not so long ago I passed AWS certification for Solution Architect Associate and want to share my experience.
You can take the exam yourself either at a certified center or at home online (PearsonVUE app). Due to the fact that no other certification center close to me worked on weekends, I decided to take the exam at home.
There are detailed requirements and guidelines for taking the exam. In general, everything is quite reasonable. You need a laptop or PC with Internet connection and a web camera. Test the connection and speed in advance. There should not be any notes, gadgets or any other turned on screens (even headphones cannot be used!) near you for the exam. If possible, the windows should be shut off. No one should enter the room where the exam is taken for the duration of the exam, the door must be closed. During the exam, you should look at the monitor so that there are no complaints from the examinator. You will not be able to go to the toilet or have a snack during the exam.
The exam takes 140 minutes and consists of 65 questions. Most often you need to pick one option out of four, although there are two options out of four or two options out of six. The questions are mostly very large and describe a typical scenario for choosing the right solutions from the AWS world. A passing score of 72%.
To pass the test you will need to install a special app that allows the examiner to monitor the screen, camera, and sound when taking the test. I advise you to do this in advance. All this information is available before the test at pearsonvue.com.
Frankly, I want to say that I could not start the exam the first time — the examiner (in terms of pearsonvue is called a proctor) for the first time did not appear after I waited for about an hour. Judging by the feedback on Reddit, this happens quite often, so be prepared for it. After I wrote in support I was allowed to reschedule the exam.
So on the day of the exam, 15 minutes before the appointed time, I opened the Peasonvue application and started to fill in the necessary fields. To confirm your identity, you must take a photo of your driver's license or passport. What is interesting, you can take a photo either on the phone or on the web. More for the sake of interest, I chose the option to take pictures with a camera on the phone. After a couple of seconds, I received a link in SMS. Following the prompts, I took the picture of my license and then the pictures of the room on the four sides. After the final confirmation on the phone, a couple of seconds later the screen on the laptop changed, saying that everything is ready for the test.
After about five minutes the examiner wrote to me in a chat and then called me. Before the beginning, I was asked to remove the documents from the table because there should be nothing left on the table and then asked to spin a laptop camera. I suppose — to make sure that everything in the room corresponds to the photos received earlier. I received a wish of luck and the exam started.
The interface with questions was unusual at first, but then I got involved in the process and no longer paid attention to the appearance. I'm a little screwed up with the timing - too much time reading and thinking about the first half of the questions, the second time was completely cut off, sometimes I did not even finish the question to make it to the end.
In the end: after more than two hours of intense thinking, you could finally relax. The examiner did not connect to me anymore - the exam ended by itself. In the end, you can see the most important thing - have you passed or have not passed the exam. This happens automatically. But the number of points comes in a couple of days.
A couple of days later I received a nice letter "Congratulations, You are Now AWS Certified".
#aws
Pearsonvue
Certification Exams & Testing - Pearson VUE
Schedule your certification exam with Pearson VUE. Explore resources and find a testing center near you.
L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵ pinned «Hey everyone! Please leave your shoes at the door and make yourselves comfy on the couch. Tea, coffee, or maybe a beer. All settled? Great, let’s gets started! So, things are changing a bit in my world. I’m updating how I communicate on Telegram and my…»
DataFrames can be partially cached, but partitions cannot be partially cached. When you use
So if an action like
#spark
cache() or persist(), DataFrame is not fully cached until you invoke an action that goes through each record e.g. count().So if an action like
take(1) is used, only one partition will be cached, because Catalyst understands that you don't need to calculate all partitions just to get one record. #spark