L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
502 subscribers
157 photos
32 videos
2 files
701 links
(ノ◕ヮ◕)ノ*:・゚✧ ✧゚・: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0
Download Telegram
Numbers everyone should know
Notebooks have traditionally been a tool for drafting code and avoiding repeated expensive computations while exploring solutions. However, with new tools like nteract's papermill and scrapbook packages, this technology has been expanded to make a reusable and parameterizable template for execution.

It's very handy if:
1. You are lazy to rewrite the jupyter notebook to the executable scripts but still want to encapsulate work and reuse it
2. You want to create some automatic reports
3. You want to create an general environment for data scientist/analyst to run their staff

#ml #python
The main quality of a leader.

The leader is a very complex and blurred concept. Parents, teachers, chiefs, commanders, captains - they are all leaders. But there is one important quality that distinguishes a good leader from a not that good. A good leader always protects his people.

- The team manager does not let the stupid marketer who fucked up the board of directors get fired - he is a good manager.
- The client calls the manager and scolds the waiter - a good manager must keep his employee from being hit by the client and accept the hit.
- A team leader in a software engineering team will receive all the sins from the team and will thank the team for the benefits of their decisions even if he did them in the first place.

A good manager always protects his people. And then scolds if necessary. But never lets them directly affect his people from outside.

#dev #soft_skills
Thank you for staying in this cozy channel with not very good English, I hope it does not grow up 😉
Continuous integration and continuous delivery are like vectors whose direction is the same and the module is different. Their purpose is the same - to increase the reliability of software development, as well as speed up development and release cycles

https://luminousmen.com/post/continuous-Integration-continuous-delivery
DuckDuckGo

DuckDuckGo (http://duckduckgo.com/) - search engine that positions itself as completely anonymous, does not collect information about you, does not store history, etc...

Try to use DuckDuckGo at home for a couple of days, very noticeable difference. But sometimes it sucks in searching for coding errors.
DuckDuckGo has ads, but unlike Google, it's only based on one request, not on your profile collected from all Google sites. DuckDuckGo is a more objective Internet that is the same for everyone, not collected individually for you. And I like it.

#usefullinks #privacy
AWS Lambda Abuse

When you deploy an endpoint that is open to the world, you open it not only for use but also for abuse.

AWS provides services to avoid common abuse methods, such as AWS Shield, which mitigates against DDoS. But it doesn't know what is and isn't abusive either.

Of course, if your Lambda feature is private, then you should use one of the API gateway security mechanisms to prevent abuse:

- IAM security
- API key security
- Custom security authorization

If one of them is present, the Lambda function can only be called by authorized users. Cool!

But what to do when your Lambda is open to the public?

Not much, unfortunately.

One of the factors that you might want to control is concurrency, or the number of simultaneous requests that are supported per account and per function. You are billed for each request plus the total memory allocation per request, so this is the unit that you want to control. Here is a pretty cool post about this. By the way, you can control this even at the Zappa settings level(!), check lambda_concurrency.

In addition to the limits for each account and for Lambda calls, you can also control Lambda exposure by wrapping up calls to the API Gateway, and Create and use API Gateway Usage Plans.

Using API Gateway Limits to create usage plans per customer, you can control API and Lambda access to prevent uncontrolled account billing.

#aws
Future of data science is data engineering

One aspect of data science that’s often over-emphasized is model tuning.

It’s very rare that focus of a data scientist will be on making a model 1% better. Typically it’s much more important to get a "good enough" model out the door and in front of users. The "good enough" model in production is 100x better than +5-10% more performant model in the jupyter notebook. Which is why software engineering and deployment skills are increasingly growing in importance over model tuning.

#ds #big_data
It is always darkest before the dawn. Just wait
The clear sign of a good interview is a slight feeling of hatred on both sides.
The application runs on Google AI algorithms. The neural network answers user’ questions with quotes from books. It can handle abstract questions like "what is the meaning of life?", check it out

https://books.google.com/talktobooks/.

#usefullinks
Over time, you may have a dozen copies of the same file lying in different corners of your system. The best idea is to track them down and eliminate them before they gain control of your hard drive.

FSlint is a utility to find and clean up various lint forms on the file system, empty directories, bad IDs and even redundant temp files, duplicate files and broken symlinks. I think it have only Linux support.

$ sudo apt install fslint
ETL vs ELT

ETL(Extract Transform Load) is a popular data processing paradigm in many popular data warehousing. Essentially we extract data from a source or sources, clean it up and convert it into the structured information we need and upload it to a target database, data warehouse or data lake.

Currently there is some movement from ETL to ELT, when the transformation takes place inside the data warehouse and not up front.

As it seems to me, this as well as all approaches and tools of data managment come from lack of knowledge of companies about their data. Because I know that traditionally there was a lot of planning and rigor that had to go into loading the data into the data warehouse to make it accessible for other people. Then there are changes in the format of the input data, then the format of the output structure, etc.

Tools such as snowflake, AWS redshift allow you to create an abstraction layer over the loaded data (even unstructured) to give a simple SQL API over the data and forget about the letter T.

#big_data
Ask stupid questions

You don't want to be the stupid guy in the room, me either. That is why you are scared of asking questions in case they are dumb.

But there is always a sense in nonsense. Start by asking stupid questions and they would lead you to sensible questions.

One who asks is a fool for a minute, one who fail to ask is a fool forever. But try to ask Google first and make sure you end up with structured questions to not annoy people.

#dev #soft_skills
I want to be an optimist like Trump, he constantly saying so much positive adjectives that I’ve never used in my life: great, incredible, tremendous, successful, classy, winning...

Somebody should do analytics on his speeches, it will be an incredible job