L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵ – Telegram

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

@iamluminousmen

502 subscribers

157 photos

32 videos

2 files

701 links

(ﾉ◕ヮ◕)ﾉ*:･ﾟ✧ ✧ﾟ･: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0

Download Telegram

About

Blog

Apps

Platform

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

502 subscribers

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Current Machine Learning tools landscape increasing in size every month. This reminds me of the explosion of "Big Data" tech. If ML follows a similar cycle, then in perhaps five years, most tools will have receded into obscurity but a few frameworks and approaches will become dominant.

Aside from the irrational but probably inevitable over-hype, I find this whole AI/ML space quite fascinating and frustrating at the same time. Most of the successful algorithms are much less complex/sophisticated than you might believe from the outside, and I believe that's a very good sign because simplicity scales. But on the other hand, there is a huge amount of brute-forcing, you can't really do serious research in this field without a few dozen million to spare, and I think that's always a limit on the innovation potential.

#ml

Machine Learning Tools Landscape v2 (+84 new tools)

[Twitter thread]

262 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Good post about comparison of AWS Glue and EMR.

To sum up:

AWS Glue:
▪️Serverless ETL
▪️Crawlers can infer the schema, identify file formats and populate metadata
▪️Expensive
▪️Limitations on configuration

EMR:
▪️Managed service over EC2 instances, not serverless
▪️Flexibility, control over the configuration
▪️Vast use cases
▪️Cheaper option

If you new to AWS configuration and you only wanted to execute simple ETL, Glue might be a sensible option. However if you wished to leverage Hadoop technologies and perform more complex transformation, EMR is the more viable solution.

You could replace Glue with EMR but not vice versa, EMR has far more capabilities than its server-less counterpart.

#aws #big_data

AWS Glue vs EMR

Amazon Web Services provide two service options capable of performing ETL: Glue and Elastic MapReduce (EMR). If they both do a similar job…

253 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

I know it's old but it's still kind of relevant. Watch Dave Hahn, a Senior Engineer at Netflix, explain what operating at Netflix and scaling in the cloud is really like on AWS re:Invent 2015.

https://youtu.be/-mL3zT1iIKw

AWS re:Invent 2015: A Day in the Life of a Netflix Engineer (DVO203)

Watch Dave Hahn, a Senior Engineer at Netflix, explain what operating at Netflix and scaling in the cloud is really like. Learn more about Netflix & AWS: http://amzn.to/2iKYhzs

253 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

238 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Convergence between a data lake and data warehouse is not just in talks but in reality. “Convergence is happening from both sides. The data warehouse vendors are gradually moving from their existing model to the convergence of the data warehouse and data lake model. Similarly, the vendors who started their journey on the data lake-side are now expanding into the data warehouse space.”

The gap is going to narrow down between these two platforms this year. Qubole is thinking in that direction where most of the problems we try to decipher can be solved by the data warehouse. The data lake is going to be the superset that can decipher all data warehouses problems along with more capabilities.”

#big_data

Is Data Lake and Data Warehouse Convergence a Reality? | Qubole

In this blog Debanjan Saha, VP and GM of Data Analytics services at Google Cloud, shares his views on the data lake and how it compares and contrasts with data warehousing.

232 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

The blog narrates how the underlying data infrastructure influences the ML development in line with the recent trends on "Reverse ETL" and the modern cloud-native data stack. The blog also reiterates we are still in the early stages of MLOps, Data Quality tooling, and unified data architecture on the path to industrialization ML development.

https://medium.com/validio/ml-data-trends-wrapping-up-2020-and-looking-into-2021-beyond-b3ff1eadc211

#ml

ML & Data Trends: Wrapping up 2020 and looking into 2021 & beyond

2020 brought a digitalization explosion across the world. Microsoft estimates that the first two months of the pandemic (March & April)…

252 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Apache Airflow published the 2020 Airflow survey results:
🔸 13.79 adoption of the general developer community outside the data engineers.
🔸 85% of people using Airflow like/ very likely recommends Airflow.
🔸 Airflow local executor popular than the Kubernetes executor
🔸 Slack & Github is a go-to place for technical questions, 2X higher than StackOverflow

#big_data

Airflow Survey 2020

We observe steady growth in number of users as well as in an amount of active contributors. So listening and understanding our community is of high importance.

255 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Somebody did a simulation of a giant banana orbiting around the earth at the distance of the ISS.

And, surprisingly, it challenges our intuitions of how orbiting objects would look like in the sky.

Source

342 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

334 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

This time we gonna talk about pizza and ML

https://luminousmen.com/post/machine-learning-types

Blog | iamluminousmen

Machine Learning types

Machine Learning is based on the idea that analytic systems can learn to identify patterns and make decisions with minimal human involvement

319 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Most of the major roadblocks I ran into were not technical, they were communicative. Sure, there were always technical challenges but that’s the role of an engineer, to fix the technical challenges. Never underestimate the importance of communication, internal and external. There’s nothing worse than solving a technical challenge when it was the wrong technical challenge to be solved.

#soft_skills

277 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

And this week we're gonna talk about a hobby project Van Rossum did over Christmas break 1989. Yep, it's Python. Again, nothing specific, just some rambling about the topic, should be interesting though.

213 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

There is a common misconception that GIL was invented to protect developers from problems with concurrent access to data. But this is not true.

GIL, of course, will prevent you from parallelizing an application using threads (but not processes). Simply put, GIL is a lock that must be taken before any access to Python (not that important if Python code is executing or calls using Python C API). Therefore, GIL will protect internal structures from non-consistent states, but you will have to use synchronization primitives like in any other language.

#python

220 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

The popular method to declare an abstract method in Python is to use NotImplentedError exception:

def func(self):
    raise NotImplementedError

Though it's pretty popular and even has IDE support (Pycharm considers such a method to be abstract) this approach has a downside. You get the error only upon method call, not upon class instantiation.

Use abc module to avoid this problem:

from abc import ABCMeta, abstractmethod
class Service(metaclass=ABCMeta):
    @abstractmethod
    def func(self):
        pass

242 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

In Python if there are no references to the object, it is destroyed immediately, instead of waiting for garbage collection. GC is needed for complicated cases when we have cyclic references.

#python

234 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

The list may contain itself. Python detects this and does not loop in the output.

>>> a = []
>>> a.append(a)
>>> a
[[...]]

234 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Python doesn't have tail recursion optimization, not because Guido couldn't handle it, but because he doesn't want to overcomplicate things.

If you really want to, you can implement a Y-combinator with optimization and use it.

#python

223 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Just like tail recursion in Python, Guido wanted to avoid getting lambda functions in the language. The lambda function is just sort of syntactic sugar. It creates an unnamed function - that's literally it. There is no magic to it.

Those two are equivalent:

foo = lambda x: x * 2

def foo(x): 
   return x * 2 
foo.__qualname__ = '<lambda>'

We all know the lambda functions were introduced in the language, but there was a long fight behind it, which Guido did not win. Lambda still exists because it is extremely convenient and nothing else.

#python

254 views16:19

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

A function can have two returns called. For example:

def foo():
    try:
        return 1
    finally:
        return 2

2 will be returned here.

#python

253 views13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

There are compilers for Python code. And not only JIT ones like Numba, but also ordinary ones. For example, Cython, Nuitka that compiles your Python code into true machine instructions rather than interpreted.

Why?

Basically for the sake of both performance gains and a more portable runtime. So basically it's for the guys who shouldn't use Python in the first place.

#python

251 viewsedited 13:16

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Remember I told you about numbers from -5 to 255 interned. That is, the numbers are preliminary placed and cached in memory. So, there is more to it.

You can access this memory if you want and change value. Say, change literal 4 to value 5. I'll probably be damned by many people for this post, but here is a sample code:

>>> import ctypes
>>> ctypes.memmove(id(4) + 24, id(5) + 24, 8)
>>> print(2 * 2) # 5

Have fun debugging!

P.S. Try to replace 0 with 1

#python

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Perhaps not every Python developer knows the interesting property of the CPython that makes newcomers go crazy:

>>> a = 255
>>> b = 255
>>> a == b
True
>>> a is b
True

The double equal operator checks that the objects values are equal and the is operator…

302 viewsedited 13:16