L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
502 subscribers
156 photos
32 videos
2 files
701 links
(ノ◕ヮ◕)ノ*:・゚✧ ✧゚・: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0
Download Telegram
​​⁠Services by type in various clouds and on-premises.

The picture from the beginning of 2019 is useful for finding similar services in different clouds.

For those who work only with AWS and consider it the leader (which it is), it may be a revelation that the range of services in Azure is larger (true story).

#aws
Python 3.8 is out
Good feature is importlib.metadata all the others are garbage. Positional-only arguments I consider a dangerous feature, I can't imagine ever use it. When the time comes and we will remove all useless crap from Python? Then there would be a great language.

For more check out the presentation or go to official docs.

#python
Modules and Packages: Live and Let Die! — David Beasley's insane three-hour presentation about the construction of modules, packages and import systems in Python. Everything is pretty comprehensive, from the very basics to the internals with examples.

Recommended👌

#python
NotImplemented

Imagine a Matrix class you're implementing:

class Matrix:
"""A simple Python matrix class"""

def __init__(self, m, n):
self.rows = []
self.m = m
self.n = n

For some reason you don't want any addition operations with your matrix instance. Ok, you implement it as following:

def __add__(self, mat):
"""Achtung! Not implemented!"""
return NotImplemented

No, it will not work!

>>> matrix - numpy.zeros(m, n)
array([...])

Why?

From docs:

> When a binary (or in-place) method returns NotImplemented the interpreter will try the reflected operation on the other type (or some other fallback, depending on the operator). If all attempts return NotImplemented, the interpreter will raise an appropriate exception. Incorrectly returning NotImplemented will result in a misleading error message or the NotImplemented value being returned to Python code.

In other words will transfer control on this operation to other instance(in our case it is numpy array).

To do that right:
def __add__(self, mat):
"""Achtung! Not implemented!"""
raise TypeError

#python
To tune Spark job performance and debug finished jobs engineers need information. This can get from spark events and debug in history server

https://luminousmen.com/post/spark-history-server-and-monitoring-jobs-performance
There are many myths in the IT field:
“You can unsubscribe from spam”,
“Backups are not needed”,
“Two antiviruses are better than one”
Mute all useless java-related logging in pyspark:

def mute_spark_logs(sc):
"""Mute Spark info logging(show only error logs)"""
logger = sc._jvm.org.apache.log4j # noqa
logger.LogManager.getLogger("org").setLevel(logger.Level.ERROR)
logger.LogManager.getLogger("akka").setLevel(logger.Level.ERROR)
logging.info("Spark muted.")


gist

#python #spark
Python useful Data Structures to work with numerical data to speed up your computations:

numpy arrays — for N-dimensional structured arrays
scipy.spatial — for spatial queries like distances, nearest neighbors, etc
pandas —for SQL-like grouping and aggregations
dask — parallel arrays, dataframes, and lists that extend to larger-than-memory or distributed environments
xarray — for grouping across multiple dimensions
scipy.sparse — sparse matrices for 2-dimensional structured data
sparse — for N-dimensional structured data
scipy.sparse.csgraph — for graph-like problems (e.g. finding the shortest path

#python
Punctuation removal

You can easily remove all punctuation using snippet

import string

input_str = “This &is [an] example? {of} string. with.? punctuation!!!!” # Sample string
result = input_str.translate(string.maketrans("", ""), string.punctuation)
print(result)

Output:

This is an example of string with punctuation

#python
There is a secret that needs to be understood in order to write good software documentation: there isn’t one thing called documentation, there are four.

They are: tutorials, how-to guides, explanation and technical reference. They represent four different purposes or functions, and require four different approaches to their creation. Understanding the implications of this will help improve most software documentation - often immensely.

Check out the Daniele Procida talk on pycon

#dev #soft_skills
​​from the CPP core guidelines

>Scream when you see a macro that isn't just used for source control (e.g., #ifdef)

It somewhat fits

source