SpaceX use Chromium and JavaScript for the Dragon 2 flight interface. Now I can officially say that frontend nowadays is a rocket science.
Check out the exact replica of the interface by the link below
https://iss-sim.spacex.com/
#stuff
Check out the exact replica of the interface by the link below
https://iss-sim.spacex.com/
#stuff
SPACEX - ISS Docking Simulator
This simulator will familiarize you with the controls of the actual interface used by NASA Astronauts to manually pilot the SpaceX Dragon 2 vehicle to the International Space Station.
Spark 3.0 now on Databricks
https://databricks.com/blog/2020/06/18/introducing-apache-spark-3-0-now-available-in-databricks-runtime-7-0.html
#spark #big_data
https://databricks.com/blog/2020/06/18/introducing-apache-spark-3-0-now-available-in-databricks-runtime-7-0.html
#spark #big_data
Databricks
Introducing Spark 3.0 - Now Available in Databricks Runtime 7.0
Learn more about the latest release of Apache Spark, version 3.0.0, including new features like AQE and how to begin using it through Databricks Runtime 7.0.
I think it's clear now who's the winner
With that said it can be obvious that standard data scientist tech stack will change in the nearest future. As an example pandas can be somewhat easily replaced by koalas. By the way, new koalas is here and it covers 80% of pandas api - https://github.com/databricks/koalas/releases/tag/v1.0.0
#spark #big_data
With that said it can be obvious that standard data scientist tech stack will change in the nearest future. As an example pandas can be somewhat easily replaced by koalas. By the way, new koalas is here and it covers 80% of pandas api - https://github.com/databricks/koalas/releases/tag/v1.0.0
#spark #big_data
Technological degradation
Technology degrades from generation to generation, because no project starts with nothing, you always choose tools, libraries, etc. to solve a problem. Of course you can do everything from scratch using the assembler, but nobody will do it because it doesn't make sense. And when you take the X library which depends on the Y and Z libraries, it doesn't mean that you know what Y and Z do, you just know what X does and how to use it. And so while any problem in IT can be solved by adding an abstraction level, this abstraction stack is extremely hard to know for one person, and without proper communication between old developers and new developers, those connections are lost. And each generation grows at its abstraction level without understanding what's going on below.
https://youtu.be/pW-SOdj4Kkk.
#dev
Technology degrades from generation to generation, because no project starts with nothing, you always choose tools, libraries, etc. to solve a problem. Of course you can do everything from scratch using the assembler, but nobody will do it because it doesn't make sense. And when you take the X library which depends on the Y and Z libraries, it doesn't mean that you know what Y and Z do, you just know what X does and how to use it. And so while any problem in IT can be solved by adding an abstraction level, this abstraction stack is extremely hard to know for one person, and without proper communication between old developers and new developers, those connections are lost. And each generation grows at its abstraction level without understanding what's going on below.
https://youtu.be/pW-SOdj4Kkk.
#dev
YouTube
Jonathan Blow - Preventing the Collapse of Civilization (English only)
Jonathan's talk from DevGAMM 2019.
https://www.youtube.com/c/DevGAMMchannel
https://www.youtube.com/c/DevGAMMchannel
Data Lake and the Data Warehouse. They seemed similar, but there are differences.
https://luminousmen.com/post/data-lake-vs-data-warehouse
https://luminousmen.com/post/data-lake-vs-data-warehouse
Blog | iamluminousmen
Data Lake vs Data Warehouse
Data Lake and the Data Warehouse. They seemed similar, but there are differences.
Nice advises on how to read research papers from Andrew Ng's CS230 Lectures on Deep learning
https://deeps.site/blog/2019/10/14/reading-research-papers-career-advice/
#ml #ds
https://deeps.site/blog/2019/10/14/reading-research-papers-career-advice/
#ml #ds
deeps.site
ML Career Advice and Reading Papers
Space to uncover things that tick.
AutoML
We should understand that ML models are not static - as soon as the data changes, so do the models and their predictions, and it is necessary to constantly monitor ML pipelines, retraining, optimization and so on. All these are "time series" problems, which should be solved by engineers and data scientists, which are not trivial from many points of view. And solutions may have huge time horizons, but the worst part is that they need to be maintained afterwards. Eww. As engineers, we love to create things, but we don't want to maintain them. To somehow automate data preprocessing, feature engineering, model selection and configuration, and the evaluation of results, the AutoML process was invented. AutoML can automate these tasks by providing a basic result, can provide high quality for certain problems and can give an understanding of where to continue research.
It sounds great, of course, but how effective is it? The answer to this question depends on how you use it. It's about understanding where people are good at and where machines are good at. People are good at connecting existing data to the real world - they understand the business area, they understand what specific data means. Machines are good at calculating statistics, storing and updating state, and doing repetitive processes. Tasks like exploratory data analysis, preprocessing of data, hyper-parameter tuning, model selection and putting models into production can be automated to some extent with an automated machine learning frameworks, but good feature engineering and draw actionable insights can be done by human data scientist that understands what he is doing. By separating these activities, we can easily benefit from AutoML now, and I think that in the future AutoML as a thing will replace most of the work of a data scientist.
Many data scientists are saying that the existence of human data scientist is still necessary after AutoML, but I doubt it. I am not talking about specific tasks to achieve maximum model accuracy or research, I am talking about real business problems. And here I think it is obvious that AutoML will win. There are not many projects in the real world that go from POC to production, and automation will help to make a quick prototypes and eventually increase ROI for the company.
What's more, I think it's noticeable that the industry is undergoing a strong evolution of ML platform solutions (e.g. Amazon Sagemaker, Microsoft Azure ML, Google Cloud ML, etc.) and as ML adoption grows, many enterprises are quickly moving to ready-to-use DS&ML platforms to accelerate time to market, reduce operating costs and improve success rates (number of ML models deployed and commissioned).
#ml
We should understand that ML models are not static - as soon as the data changes, so do the models and their predictions, and it is necessary to constantly monitor ML pipelines, retraining, optimization and so on. All these are "time series" problems, which should be solved by engineers and data scientists, which are not trivial from many points of view. And solutions may have huge time horizons, but the worst part is that they need to be maintained afterwards. Eww. As engineers, we love to create things, but we don't want to maintain them. To somehow automate data preprocessing, feature engineering, model selection and configuration, and the evaluation of results, the AutoML process was invented. AutoML can automate these tasks by providing a basic result, can provide high quality for certain problems and can give an understanding of where to continue research.
It sounds great, of course, but how effective is it? The answer to this question depends on how you use it. It's about understanding where people are good at and where machines are good at. People are good at connecting existing data to the real world - they understand the business area, they understand what specific data means. Machines are good at calculating statistics, storing and updating state, and doing repetitive processes. Tasks like exploratory data analysis, preprocessing of data, hyper-parameter tuning, model selection and putting models into production can be automated to some extent with an automated machine learning frameworks, but good feature engineering and draw actionable insights can be done by human data scientist that understands what he is doing. By separating these activities, we can easily benefit from AutoML now, and I think that in the future AutoML as a thing will replace most of the work of a data scientist.
Many data scientists are saying that the existence of human data scientist is still necessary after AutoML, but I doubt it. I am not talking about specific tasks to achieve maximum model accuracy or research, I am talking about real business problems. And here I think it is obvious that AutoML will win. There are not many projects in the real world that go from POC to production, and automation will help to make a quick prototypes and eventually increase ROI for the company.
What's more, I think it's noticeable that the industry is undergoing a strong evolution of ML platform solutions (e.g. Amazon Sagemaker, Microsoft Azure ML, Google Cloud ML, etc.) and as ML adoption grows, many enterprises are quickly moving to ready-to-use DS&ML platforms to accelerate time to market, reduce operating costs and improve success rates (number of ML models deployed and commissioned).
#ml
As part of writing a longread on Privacy, I am looking at some interesting software that will help remain private in the internet. In the next few days there will be a little thread about what I use or what I like.
Trace - extension for the browser to replace its fingerprints, in addition to Canvas, it can replace audio prints, WebGL, User-agent, model graphics card and processor, etc.
https://github.com/jake-cryptic/AbsoluteDoubleTrace/.
#privacy
Trace - extension for the browser to replace its fingerprints, in addition to Canvas, it can replace audio prints, WebGL, User-agent, model graphics card and processor, etc.
https://github.com/jake-cryptic/AbsoluteDoubleTrace/.
#privacy
GitHub
GitHub - underrobyn/AbsoluteDoubleTrace: A web extension to block browser fingerprinting (Manifest V2)
A web extension to block browser fingerprinting (Manifest V2) - underrobyn/AbsoluteDoubleTrace
DeepFaceDrawing: Deep Generation of Face Images from Sketches
Done using deep image-to-image translation techniques, really cool work but some of the result images are very scary if you look at the face details.
https://youtu.be/HSunooUTwKs
Original paper: http://geometrylearning.com/paper/DeepFaceDrawing.pdf
#ds #ml
Done using deep image-to-image translation techniques, really cool work but some of the result images are very scary if you look at the face details.
https://youtu.be/HSunooUTwKs
Original paper: http://geometrylearning.com/paper/DeepFaceDrawing.pdf
#ds #ml
Who cares about statistically insignificant stackoverflow report when there's another PornHub annual report? For those who are missing out like me
https://www.pornhub.com/insights/2019-year-in-review
https://www.pornhub.com/insights/2019-year-in-review
Anonymous Telegram Bot. The bot itself acts as a secure wall between other users and yourself, making you anonymous while providing text replies to the other users. The idea behind this bot is for people who own public channels and don't want to expose their private profiles to everybody.
https://github.com/fndh/Anonymous-Telegram-Bot
#privacy
https://github.com/fndh/Anonymous-Telegram-Bot
#privacy
S3 vs HDFS
I am very annoyed that all sorts of big data developers confuse S3 and HDFS systems, assuming that S3 is somehow connected with HDFS.
That's not true.
HDFS is a distributed file system designed to store big data. It runs on physical machines that can run something else. S3 is the storage of AWS objects, it has nothing to do with storing files, all data in S3 is stored as Object Entities to which the key (document name), value (object content) and VersionID are associated. There is nothing else you can do in S3 because it is not a file system. S3 has " presumably" unlimited storage in the cloud, but HDFS does not. S3 performs deletion or modification of the records in a eventually consistent way.
There are many other criteria like cost, SLA, durability, elasticity (you can create a custom lifecycle and version control over objects). But let's not think about it, S3 wins there anyway.
Hadoop and HDFS have made it cheap to store and distribute large amounts of data. But now that everyone is moving to cloud architectures, the benefits of HDFS are minimal and not worth the complexity that it brings. That's why now and in the future organizations will use S3 as a backend for their data storage solutions.
#big_data
I am very annoyed that all sorts of big data developers confuse S3 and HDFS systems, assuming that S3 is somehow connected with HDFS.
That's not true.
HDFS is a distributed file system designed to store big data. It runs on physical machines that can run something else. S3 is the storage of AWS objects, it has nothing to do with storing files, all data in S3 is stored as Object Entities to which the key (document name), value (object content) and VersionID are associated. There is nothing else you can do in S3 because it is not a file system. S3 has " presumably" unlimited storage in the cloud, but HDFS does not. S3 performs deletion or modification of the records in a eventually consistent way.
There are many other criteria like cost, SLA, durability, elasticity (you can create a custom lifecycle and version control over objects). But let's not think about it, S3 wins there anyway.
Hadoop and HDFS have made it cheap to store and distribute large amounts of data. But now that everyone is moving to cloud architectures, the benefits of HDFS are minimal and not worth the complexity that it brings. That's why now and in the future organizations will use S3 as a backend for their data storage solutions.
#big_data
Get Google search results, but without advertising, JavaScript, AMP links, cookies or IP tracking. Easily deploy as a one-click Docker application and customize with a single configuration file. Quick and easy to implement as a major replacement for a search engine on both desktop and mobile devices.
https://github.com/benbusby/whoogle-search
#privacy
https://github.com/benbusby/whoogle-search
#privacy
GitHub
GitHub - benbusby/whoogle-search: A self-hosted, ad-free, privacy-respecting metasearch engine
A self-hosted, ad-free, privacy-respecting metasearch engine - benbusby/whoogle-search
HTTPS Everywhere. This is an extension made by EFF and Tor Project which forces a website to use a version of HTTPS, if available. Many websites use the default version of HTTP or have links inside the site that will lead you to this unprotected site.
https://www.eff.org/https-everywhere
#privacy
https://www.eff.org/https-everywhere
#privacy
Electronic Frontier Foundation
Electronic Frontier Foundation
Defending your rights in the digital world
Bitwarden is a very easy to use but functional password manager that has clients for all platforms. It can easily compete with giants such as LastPass and 1Password. But unlike them, it has fully open source code and does not ask for money. And Bitwarden uses strong end-to-end encryption, so no one else will have access to passwords. Also Bitwarden offers both a cloud hosted and on-premise version(!), so you can host it yourself.
https://bitwarden.com/
#privacy
https://bitwarden.com/
#privacy
Bitwarden
Best Password Manager for Business, Enterprise & Personal | Bitwarden
Bitwarden is the most trusted password manager for passwords and passkeys at home or at work, on any browser or device. Start with a free trial.