L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Nice tips for those who use Airflow, worth reading

https://medium.com/datareply/airflow-lesser-known-tips-tricks-and-best-practises-cf4d4a90f8f

#big_data

Medium

Airflow: Lesser Known Tips, Tricks, and Best Practises

Lesser known Tips, Tricks, and Best Practises to use Apache Airflow and develop DAGs like a Pro

145 viewsedited 09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Type checking in runtime

By default, function annotations do not influence how your code is working, but merely help you to point code intentions and raise linting errors.

However, you can enforce type checking in runtime with tools like enforce, this can help you in debugging (there are many cases when type hinting is not working).

@enforce.runtime_validation
def foo(text: str) -> None:
    print(text)

foo('Hi') # ok
foo(5)    # fails

@enforce.runtime_validation
def any2(x: List[bool]) -> bool:
    return any(x)

any([False, False, True, False]) # True
any([False, False, True, False]) # True

any(['False']) # True
any(['False']) # fails

any([False, None, "", 0]) # False
any([False, None, "", 0]) # fails

#python

GitHub

GitHub - RussBaz/enforce: Python 3.5+ runtime type checking for integration testing and data validation

Python 3.5+ runtime type checking for integration testing and data validation - RussBaz/enforce

149 views09:12

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Use pathlib

pathlib is a default module in python3, that helps you to avoid tons of confusing os.path.joins:

from pathlib import Path

dataset_dir = 'data'
dirpath = Path('/path/to/dir/')

full_path = dirpath / dataset_dir

for filepath in dataset_dir.iterdir():
    with filepath.open() as f:
        # do stuff

Previously it was always tempting to use string concatenation (concise, but obviously bad), now with pathlib the code is safe, concise, and readable.

Also pathlib.Path has a bunch of methods and properties, that I previously had to google:

p.exists()
p.is_dir()
p.parts
p.with_name('sibling.png') # only change the name, but keep the folder
p.with_suffix('.jpg') # only change the extension, but keep the folder and the name
p.chmod(mode)
p.rmdir()

See how easy it to get all images recursively(without glob module):

found_images = pathlib.Path('/path/').glob('**/*.jpg')

#python

161 views09:13

👍 5

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Guide into bucketing - an optimization technique that uses buckets to determine data partitioning and avoid data shuffle.

https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark

Blog | iamluminousmen

The 5-minute guide to using bucketing in Pyspark

Learn how to optimize your Apache Spark queries with bucketing in Pyspark. Discover how bucketing can enhance performance by avoiding data shuffling.

172 views13:09

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

142 views09:12

👍

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Pip constraints files

In python, it is common practice to write all the application dependencies that are installed via pip into a separate text file called requirements.txt. It's good practice to fully specify package versions in your requirements file. And in our case, everything will be there — both direct dependencies of our application and dependency dependencies, etc.

But sometimes, especially on a long-lived project, it's hard to understand what dependencies were original. It is necessary to update them on time, not depend on packages that are outdated or no longer needed for some reason.

For example, which of the following dependencies is the original?

# requirements.txt
numpy==1.17.4
pandas==0.24.2
python-dateutil==2.8.1
pytz==2019.3
six==1.13.0

Yes, it's pandas.

One of the mechanisms for separating dependencies is implemented using another text file called constants.txt.

It looks exactly like requirements.txt:

# constants.txt
numpy==1.17.4
python-dateutil==2.8.1
pytz==2019.3
six==1.13.0

Constraints files differ from requirements files in one key way: putting a package in the constraints file does not cause the package to be installed, whereas a requirements file will install all packages listed. Constraints files are simply requirements files that control which version of a package will be installed but provide no control over the actual installation.

To use this file, you can do it via the requirements.txt file:

-c constraints.txt
pandas==0.24.2

pip install -c constraints.txt

will install all packages from requirements.txt and using constraints.txt files for version constraint.

#python

12factor.net

The Twelve-Factor App

A methodology for building modern, scalable, maintainable software-as-a-service apps.

151 views09:12

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

This media is not supported in your browser

VIEW IN TELEGRAM

#wednesday

139 views09:12

👍 3

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Go test the most advanced neural network model for generating text - GPT-2. You write a phrase, the neural network complements it to a small text.

Try it yourself: https://talktotransformer.com/

#ml

Openai

Better language models and their implications

We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question…

134 viewsedited 09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

128 views09:12

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Saturday?

It's time to send an email!

https://youtu.be/szdbKz5CyhA

YouTube

How to send an e mail 1980's style. Electronic message writing down the phone line. First shown on Thames TV's computer programme 'Database' in 1984
07/06/1984
If you would like to license a clip for your production please e mail archive@fremantle.com
Quote:…

137 viewsedited 13:09

👍

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Drum machine in Excel

Disgustingly interesting...

http://youtube.com/watch?v=To2JIXGoYzA

YouTube

I made an actual Drum Machine in Excel

and you can do more than just play drums with it

download link in description

Please consider supporting me on patreon:
Patreon: https://www.patreon.com/dylantallchief

Download Drum Machine (tested on Excel 2019/Office 365):
https://www.patreon.com/posts/excel…

149 views09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

"In all likelihood, sorting is one of the most researched classes of algorithms. It is a fundamental task in Computer Science, both on its own and as a step in other algorithms. Efficient algorithms for sorting and searching are now taught in core undergraduate classes. Are they at their best, or is there more blood to squeeze from that stone? This talk will explore a few less known – but more allegro! – variants of classic sorting algorithms. And as they say, the road matters more than the destination. Along the way, we’ll encounter many wondrous surprises and we’ll learn how to cope with the puzzling behavior of modern complex architectures."

If u know who is Andrei Alexandrescu than it's recommended 👌

https://youtu.be/FJJTYQYB1JQ

#dev

YouTube

Sorting Algorithms: Speed Is Found In The Minds of People - Andrei Alexandrescu - CppCon 2019

http://CppCon.org
Discussion & Comments: https://www.reddit.com/r/cpp/
Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/CppCon/CppCon2019

Sorting Algorithms: Speed Is Found In The Minds of People
…

177 viewsedited 09:12

👍

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

This media is not supported in your browser

VIEW IN TELEGRAM

#wednesday

170 views09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

165 views09:12

👍 4

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

DataFrames are the wave of the future in the Spark world so let's push your Pyspark SQL knowledge in using various join types

https://luminousmen.com/post/introduction-to-pyspark-join-types

Blog | iamluminousmen

Introduction to Pyspark join types

DataFrames and Spark SQL API are the waves of the future in the Spark world. Here, I will push your Pyspark SQL knowledge into using different types of joins

177 views09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

This media is not supported in your browser

VIEW IN TELEGRAM

#wednesday

169 views09:12

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Your top skill

The most important skill that will allow you to stay in trend - is the skill to solve problems.

Not the list of known programming languages, not the ability to close tasks, not knowledge of the bazillion algorithms, not certificates of the scrum master.

Namely, the ability to take a real problem and solve it yourself is the main skill of a professional. Solve it yourself does not mean alone. By yourself means to be able to find the necessary resources, people, set tasks, control the result and bear responsibility for this.

Such people will always be needed, in any field and at any age.

#dev #soft_skills

195 viewsedited 13:09

👍 7

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Continuous integration and continuous delivery are like vectors whose direction is the same and the module is different. The purpose of the tricks is the same: to increase the reliability of software development and releases, as well as speed up development and releases. Let me tell you how

https://luminousmen.com/post/continuous-Integration-continuous-delivery

Blog | iamluminousmen

Continuous Integration & Delivery main ideas

The goal of the CI/CD processes is to increase reliability and speed up software development while maintaining quality

180 views09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

What you should know about Hideo Kojima in one video

https://youtu.be/aluD5CDZ6Zk

149 views09:12

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Stop Installing Tensorflow using pip for performance sake!

There are two pretty good reasons why you should install Tensorflow using conda instead of pip. For those of you not in the know, conda is an open source, cross-platform package and environment management system.

https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c

#python

Medium

Stop Installing Tensorflow using pip for performance sake!

Get 8X the speed boost with the conda installation compared to the pip installation.

192 viewsedited 13:09

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Merry Xmas!

https://youtu.be/pgtfcIdOYl0

147 views15:31

👍

About

Blog

Apps

Platform