L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
502 subscribers
156 photos
32 videos
2 files
701 links
(ノ◕ヮ◕)ノ*:・゚✧ ✧゚・: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0
Download Telegram
Python 3.8 is out
Good feature is importlib.metadata all the others are garbage. Positional-only arguments I consider a dangerous feature, I can't imagine ever use it. When the time comes and we will remove all useless crap from Python? Then there would be a great language.

For more check out the presentation or go to official docs.

#python
Modules and Packages: Live and Let Die! — David Beasley's insane three-hour presentation about the construction of modules, packages and import systems in Python. Everything is pretty comprehensive, from the very basics to the internals with examples.

Recommended👌

#python
NotImplemented

Imagine a Matrix class you're implementing:

class Matrix:
"""A simple Python matrix class"""

def __init__(self, m, n):
self.rows = []
self.m = m
self.n = n

For some reason you don't want any addition operations with your matrix instance. Ok, you implement it as following:

def __add__(self, mat):
"""Achtung! Not implemented!"""
return NotImplemented

No, it will not work!

>>> matrix - numpy.zeros(m, n)
array([...])

Why?

From docs:

> When a binary (or in-place) method returns NotImplemented the interpreter will try the reflected operation on the other type (or some other fallback, depending on the operator). If all attempts return NotImplemented, the interpreter will raise an appropriate exception. Incorrectly returning NotImplemented will result in a misleading error message or the NotImplemented value being returned to Python code.

In other words will transfer control on this operation to other instance(in our case it is numpy array).

To do that right:
def __add__(self, mat):
"""Achtung! Not implemented!"""
raise TypeError

#python
To tune Spark job performance and debug finished jobs engineers need information. This can get from spark events and debug in history server

https://luminousmen.com/post/spark-history-server-and-monitoring-jobs-performance
There are many myths in the IT field:
“You can unsubscribe from spam”,
“Backups are not needed”,
“Two antiviruses are better than one”
Mute all useless java-related logging in pyspark:

def mute_spark_logs(sc):
"""Mute Spark info logging(show only error logs)"""
logger = sc._jvm.org.apache.log4j # noqa
logger.LogManager.getLogger("org").setLevel(logger.Level.ERROR)
logger.LogManager.getLogger("akka").setLevel(logger.Level.ERROR)
logging.info("Spark muted.")


gist

#python #spark
Python useful Data Structures to work with numerical data to speed up your computations:

numpy arrays — for N-dimensional structured arrays
scipy.spatial — for spatial queries like distances, nearest neighbors, etc
pandas —for SQL-like grouping and aggregations
dask — parallel arrays, dataframes, and lists that extend to larger-than-memory or distributed environments
xarray — for grouping across multiple dimensions
scipy.sparse — sparse matrices for 2-dimensional structured data
sparse — for N-dimensional structured data
scipy.sparse.csgraph — for graph-like problems (e.g. finding the shortest path

#python
Punctuation removal

You can easily remove all punctuation using snippet

import string

input_str = “This &is [an] example? {of} string. with.? punctuation!!!!” # Sample string
result = input_str.translate(string.maketrans("", ""), string.punctuation)
print(result)

Output:

This is an example of string with punctuation

#python
There is a secret that needs to be understood in order to write good software documentation: there isn’t one thing called documentation, there are four.

They are: tutorials, how-to guides, explanation and technical reference. They represent four different purposes or functions, and require four different approaches to their creation. Understanding the implications of this will help improve most software documentation - often immensely.

Check out the Daniele Procida talk on pycon

#dev #soft_skills
​​from the CPP core guidelines

>Scream when you see a macro that isn't just used for source control (e.g., #ifdef)

It somewhat fits

source
​​Dictionaries in CPython are everywhere, classes, global variables, kwargs parameters are based on them, the interpreter creates thousands of dictionaries, even if you did not add any curly brackets in your script. And it is not surprising that their implementation continues to improve and increasingly acquire various tricks.

The internal structure of dictionaries in Python is not limited only to buckets and closed hashing. If you don’t know the number of elements in the dictionary you just created, how much memory is spent for each element, why now (CPython 3.6>) the dictionary is implemented in two arrays and how it relates to maintaining the insertion order, or you just didn’t watch the presentation by Raymond Hettinger "Modern Python Dictionaries A confluence of a dozen great ideas. Then the time has come.

Recommended 👌

#python
Singleton pattern in Python

Do you like Singletons? I don't too — they are a bit complicated.

But you know what? I've never seen in any code(except some famous libraries) and on any interview good implementation of Singleton pattern. We need to fix it!

Please check the following implementation:

 weakref import WeakValueDictionary

class Singleton(type):
_instances = WeakValueDictionary()

def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
instance = super(Singleton, cls).__call__(*args, **kwargs)
cls._instances[cls] = instance
return cls._instances[cls]


class Config(metaclass=Singleton):
pass


Do you know a better implementation? Send me yours and we will discuss

#python