L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵
502 subscribers
156 photos
32 videos
2 files
701 links
(ノ◕ヮ◕)ノ*:・゚✧ ✧゚・: *ヽ(◕ヮ◕ヽ)

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering and Machine Learning

http://luminousmen.com

License: CC BY-NC-ND 4.0
Download Telegram
​​Researchers have developed a new attack that steals user data by injecting prompts in images processed by AI systems before delivering them to a LLM. The method relies on full-resolution images that carry instructions invisible to the human eye but become apparent when the image quality is lowered through resampling algorithms.

🔗Link: https://www.bleepingcomputer.com/news/security/new-ai-attack-hides-data-theft-prompts-in-downscaled-images/
AI: great expectations, March 28th, 1988

🔗Link: https://people.csail.mit.edu/brooks/idocs/AI_hype_1988.pdf
🤯2👀2
Since we all are going to be unemployed soon
😁63
Not sure what to make of this, but Googling HDFS now routes me directly to Harley-Davidson financing. Either Google's confused... or this is how the internet tells you you've reached the 'motorcycle loan' demographic
🦄51👀1
Come on, this is fucking ridiculous

"hey claude, create a datasheet where our model is leading on every benchmark (btw create a benchmark)"

🔗Link: https://www.anthropic.com/news/claude-opus-4-5
🔥4💯1
Most people treat BigQuery like a magic SQL endpoint.

You write a query, hit Run, wait a few seconds... and a petabyte-sized answer pops out.

If it's slow or expensive, the default reaction is: "I need more compute".

That's backwards.

BigQuery is designed to skip work, not to muscle through it:

https://luminousmen.com/post/bigquery-explained-what-really-happens-when-you-hit-run
🔥1
Security researchers at PromptArmor have discovered a critical vulnerability in Google Antigravity - Google's new AI-powered IDE that uses Gemini-based agents. Through an indirect prompt-injection attack, an outside actor can:

- Trick Gemini into reading sensitive local files (like .env files or API keys)
- Use the built-in agent browser to quietly exfiltrate that data through crafted URLs
- Bypass safeguards such as "secret filtering" or .gitignore protections by triggering shell commands like cat

Antigravity's agents are granted broad capabilities - access to code, a shell, and a browser - a single injected prompt hidden in a README or a code comment can silently leak data without any user action😦

If you're experimenting with Antigravity or any similar agent-driven development tools, keep the following in mind:

- Lock down access to secrets
- Audit what capabilities your agents actually have
- Treat AI agents like remote developers - don't give them any more power than you'd hand to a junior engineer with near-root access

🔗 Link: https://promptarmor.com/resources/google-antigravity-exfiltrates-data
👍2
ONLYFANS could be the most revenue-efficient company on the planet, beating Nvidia, Meta, Tesla, and Amazon - powered by ass, not AI.
😎9
Lowering the gates to the CUDA moat.
NotebookLM - generated infographics follows the Google's new TPU announcement

🔗Link: https://www.linkedin.com/posts/semianalysis_notebooklm-recently-introduced-a-new-function-activity-7400973159853780992-PsXz
👍1
Throughout my career, I keep coming back to the same optimization in data pipelines:

Filter as early as possible.

Recently I cut a 3-hour job down to 30 minutes and dropped compute cost from $600 to $9 just by doing that.

If your analytics team needs sales from just three stores, don't build the full sales mart and filter later. That's waste.

Push the store filter upstream-before joins, before aggregations, as close to storage as you can. Join only on those store IDs from the start.

On most engines this means less data scanned, less shuffling, and better use of partition pruning / predicate pushdown. In practice you get:

- Less I/O
- Less memory pressure
- Faster, cheaper queries

But here's the nuance: don't hardcode business logic upstream. Maintainability still matters.

Instead of sprinkling storeid IN (...) across jobs, drive those filters from config, parameters, or dimension tables (like an activestores view). Same optimization, less brittleness.

Before you run your next pipeline, ask:

Can I reduce data volume earlier without introducing fragile business logic?
💯5👍1