L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
502 subscribers
156 photos
32 videos
2 files
701 links
(ノ◕ヮ◕)ノ*:・゚✧ ✧゚・: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0
Download Telegram
​​When I got my first computer, I had a Windows system installed on it. I don't know about you, but every time there was an error in Windows that meant nothing to me I went to the lower Internet to solve it. And I found out that there are many more interesting things on the lower Internet...

Windows has created a layer of programmers who can solve problems. Thanks to her for that.

But Windows OS is still a pile of horse shit.

That's why you can buy a new year's ugly sweater with it: https://gear.xbox.com/pages/windows
​​Andreessen Horowitz published a detailed guide on the state of architectures for data infrastructure.

It covers data sources, data ingestion and transformation, storage, historical (for analytics), predictive and outputs along with different tools that can be used for each as well as case studies from several large companies on their data infrastructure setups.

Highly recommended.

https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure/
​​They have proved my observations on Data Management trends from the post with their cool infographics
​​The report of the World Economic Forum has been published with the forecast of how the labor market will change in the next five years. The main topic is the mass transition to online and digitalization of all professions, including those influenced by COVID-19.

WEF analysts predict that by 2025 the world will have 97 million new jobs, and 85 million - will disappear or be automated.

https://www.weforum.org/reports/the-future-of-jobs-report-2020
​​Demand for Data Engineers Up 50%

In the subject of the previous post. The Dice 2020 Tech Job Report labeled data engineer as the fastest-growing job in technology in 2019, with a 50% year-over-year growth in the number of open positions. Check out the report

https://techhub.dice.com/Dice-2020-Tech-Job-Report.html
The Difference Between Amateurs and Professionals

Interesting read, take a look. Basically you can read it as soft skills rules.

Those are the things I like in particular:

Amateurs have a goal. Professionals have a process.

Amateurs go faster. Professionals go further.

https://fs.blog/2017/08/amateurs-professionals/

#soft_skills
Fuck it

FuckIt.py uses state-of-the-art technology to make sure your Python code runs whether it has any right to or not. Does some code have an error? Fuck it.

https://github.com/ajalt/fuckitpy

I want to draw your attention to the tests where you can see full proof that P ≠ NP.

#python
Complexity in distributed systems

The complexity of applications is often narrowed to a single concept of time Big O complexity. Simply put Big O notation describes how the runtime scales with respect to some input variables. With Big O Notation, you can mathematically describe how the function(application) will behave when the input size goes to infinity.

Although academically it makes sense, in real life engineers have more indicators of application complexity.

But in addition to the time complexity(converted to the number of operations), we also have memory complexity. And time is unlimited and in theory, we can solve anything in an unlimited amount of time. On the other hand, memory is always a limited resource and we should not forget about it when solving practical tasks. For example, when you have a huge array you need to sort and one memory cell - you can't use the QuickSort algorithm, you'll have to look for alternatives.

But apart from the limitations on the number of operations and memory, we also have other resources that we need to consider.

In the current world of microservice architecture, distributed systems increasingly dominate the development world. The number of nodes and executors on these nodes and network capacity in a system affects the complexity of a distributed algorithm.

Informally, data engineers have an understanding that the number of inter-node communication and the types of communication affect the complexity of the distributed algorithms. This is how the Big Data world defines shuffle operations and tries to move from a synchronous model to a concurrent one if possible. But when reading academic papers you do not see such practical considerations without which sometimes it makes no sense to implement one or another SOTA algorithm on paper.

#ml #dev
Distributed systems is hard because:

▪️Engineers can’t combine error conditions. Instead, they must consider many permutations of failures. Most errors can happen at any time, independently of (and therefore, potentially, in combination with) any other error condition.
▪️The result of any network operation can be UNKNOWN, in which case the request may have succeeded, failed, or received but not processed.
▪️Distributed problems occur at all logical levels of a distributed system, not just low-level physical machines.
▪️Distributed problems get worse at higher levels of the system, due to recursion.
▪️Distributed bugs often show up long after they are deployed to a system.
▪️Distributed bugs can spread across an entire system.
▪️Many of the above problems derive from the laws of physics of networking, which can’t be changed.

Author changes the "independent failure" term to "sharing fate". Lol

https://aws.amazon.com/builders-library/challenges-with-distributed-systems/

#big_data
​​How to know that your machine learning problem is hopeless?

Very interesting question on stack exchange:

How do you know that your data actually is hopeless and all the fancy models wouldn't do you any more good than predicting the average outcome for all cases or some other trivial solution?

https://stats.stackexchange.com/questions/222179/how-to-know-that-your-machine-learning-problem-is-hopeless

#ml
Lessons learned from writing the book/1

So I wrote a book.

Writing a book is fucking hard, it is hard work, especially when you are not Stephen King. It is even harder when the publisher has a hard deadline. Fortunately, I did not have such conditions — I published the book myself and did the whole process from beginning to end. But I spent some time digging through Reddit threads dedicated to authorship for publishers like Packt and O'Reilly. And made some conclusions which I want to share.

Let's start with the most interesting thing — it's unlikely that you'll be able to make money on a book. But if you work with a publisher, the book is paid in advance and you get a commission.

But no matter how you publish, with or without a publisher — you will spend a lot of time on it, far too much. If you convert the time spent on creating a concept, R&D, code writing and testing, design, formatting, publishing, and writing the material itself into hourly wages, you will realize that you "worked" at a loss.

And this is well understood not only by me but by many authors when they agree on writing a book. The motivation is as follows:

- New experience
- Prestige and opportunity to add the word "author" to your LinkedIn profile, i.e. personal brand.
- 4 upper levels of the Maslow's hierarchy of needs
- "If you are doing something good, do not do it for free"

My motivation was the same.

Technical literature has a very small audience, and there is no need to have illusions about the number of sales. After all, I am a modest engineer who does not write bestsellers. Judging by my Reddit research, many authors spend more money on R&D than they end up earning. At least, on their first book.

amazon
leanpub
gumroad

#stuff
Lessons learned from writing the book/2

The next point that is important to realize when you think about writing a book is that you need skills.

The skills not of what you're going to write about (although it goes without saying), but the skills of writing and presenting the material.

I do not have such skills — so writing was very hard for me, despite the fact that most of the material was already somehow more or less ready — taken from my own posts. It's hard to read the same thing for the fiftieth time and try to make clear and consistent sentences so it can be more or less easy to read. I even don't mention English.

It is a book about asynchronous programming. The topic I chose at random — I can't say that I write asynchronous code every day and know everything about this topic, but I think I have a good understanding of the concepts and had experience writing such applications. And it seemed to me that few people understand and write about this topic at the concept level. In fact, I decided to help people like me — to solve questions that sometimes arise in my head. Also, most of the material was already written in my blog which made the whole process a bit easier.

Technical literature, in general, is difficult to write properly. It is necessary to understand roughly the knowledge of the potential reader. You need to choose the right terms and use them consistently. And you shouldn't go far from the topic (which I did not really succeed). And you have to write not just clearly but structurally correct — approach the topic/concept from the right side, move smoothly from one chapter to another, and draw conclusions, even the obvious ones.

The logic here is very simple. The material should be the one you want to read and advise your friends, that you as an engineer buy in the "working" library. This requires not only a thorough knowledge of the topic but also the right approach to deliver it in such a way that the reader understands the material and does not die of boredom.

amazon
leanpub
gumroad

#stuff
Lessons learned from writing the book/3

The author's work does not end after all chapters have been written. It is in the author's interest to help in R&D, correct formatting, participate in book marketing, and fill in product pages and descriptions.

Self-publication is much more complicated than a traditional publication. No one provides you with an editor and design team, so you are responsible for the whole project. If you happen to have friends or colleagues who can help you with any part of the work, it will certainly make things a little easier. But self-publishing still requires a lot of work and sometimes investment(in design, formatting, pictures, etc).

For me, formatting the book was a nightmare — I couldn't find decent tools at all. And I wasn't just looking for free ones. I went through a cycle from latex, pandoc, designer, calibre, google docs, Kindle Create, iBooks Author, and several others.

As a result, I wrote and format everything in google docs and then moved it to Kindle Create for Amazon publication.

The biggest problem with all those tools is code formatting — in google docs I managed to write it with widgets. But in all other places, you have options — using pictures or writing a heap of CSS.

I stick with images and it looks disgraceful, but at least it does the job on all formats and sizes. In Kindle Create even text cannot be selected!

After I've finished the book I found out the better way to format and write ebooks. And I would recommend using leanpub.com for that. Not only they provide you with great tools for creating ebooks, connecting authors with readers, managing sales, landing pages, and the like, they are also heavily involved in advancing the industry with new standards such as Markua (a book-targeted, enhanced flavor of MarkDown). And it even can generate all required formats(pdf, epub, mobi) and it's free!

amazon
leanpub
gumroad

#stuff
Lessons learned from writing the book/4

In the end, I published the book in pdf and epub formats on several websites — amazon, learnpub, gumroad. Amazon is the easiest resource for publication but the royalties are very low — the author gets only 30%.

In my opinion, leanpub is by far the most usable, well-crafted service on the market for ebook authors, and I couldn’t recommend them more. I like learnpub for its very straightforward approach. Easy to set up everything and it's pretty nice to look at the end result. And the royalties there is 80%! No, it's not an advertisement.

So the conclusion — I can't say that I'm very proud of my book, but I'm certainly not ashamed of it. Maybe one day I'll try to write it again, and maybe something closer to my topic of interest.

amazon
leanpub
gumroad

#stuff
​​I hope you are already overflowing with an unrelenting desire to finish this year. It obviously fell out of the normal distribution, hopefully next year will be more representative.

Thank you for reading and motivating me to write more. Happy New Year!