guda – Gudasoft

Authorguda

2025-10 – Good reading

J October, 2025 / guda

Security

How one npm guy was infected with malware – to get the idea of how hackers code look https://www.aikido.dev/blog/npm-debug-and-chalk-packages-compromised
The DynamoDB outage explained – they explain the DynamoDB dns load-balancing architecture https://aws.amazon.com/message/101925

Last good MCP:

DataScience

Sample lambda application to run Strand Agent over Lambda – easy to read – high value https://makit.net/blog/running-a-tiny-strands-agent-on-lambda/
cookbook with examples https://github.com/aws-samples/amazon-nova-samples
Open source/AWS service for search with embedding: https://opensearch.org/
https://aws.amazon.com/blogs/aws/amazon-nova-multimodal-embeddings-now-available-in-amazon-bedrock/ very good example of creating embedding and using them for search.

Airflow 3 with ECR Docker Token Refresh DAG

J July, 2025 / guda

Overview

This DAG automatically refreshes Docker ECR (Elastic Container Registry) authentication tokens in Apache Airflow. ECR tokens expire every 12 hours, so this DAG runs twice daily to ensure continuous access to your Docker registry.

Previously there was a Session object with which i can update the database. But now that method is forbidden and the only way is by creating an API user and use to update the connection.

What This DAG Does

The DAG performs three main tasks:

Extract ECR Token: Uses AWS boto3 to get a fresh authorization token from ECR
Update Docker Connection: Updates Airflow’s docker_default connection with the new token using JWT authentication
Test Connection: Validates that the updated connection works properly

Prerequisites

AWS Setup

AWS account with ECR access
IAM user/role with ECR permissions:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ecr:GetAuthorizationToken" ], "Resource": "*" } ] }

Airflow Environment

Airflow 3.x
Docker provider package installed
AWS credentials configured (via IAM role, environment variables, or AWS credentials file)

Step-by-Step Setup

Configure Airflow Variables

Set the following Airflow Variables in the Admin UI or via CLI:

# Via Airflow CLI
airflow variables set ecr_aws_account "123456789012"  # Your AWS account ID
airflow variables set ecr_aws_region_name "us-east-1"  # Your ECR region

Or via Airflow UI:

Go to Admin → Variables
Add ecr_aws_account with your AWS account ID
Add ecr_aws_region_name with your ECR region

3. Create Airflow API Connection

This is required because Airflow do not allow access to the database via the session object. It is something new from Airflow 3.

Create a connection for the Airflow API:

Via Airflow UI:

Go to Admin → Connections
Click the “+” button to add a new connection
Fill in the details:
- Connection Id: airflow-api
- Connection Type: HTTP
- Host: localhost (or your Airflow webserver host)
- Schema: http (or https if using SSL)
- Port: 8080 (or your Airflow webserver port)
- Login: Your Airflow username
- Password: Your Airflow password

Via CLI:

airflow connections add airflow-api \
    --conn-type http \
    --conn-host localhost \
    --conn-schema http \
    --conn-port 8080 \
    --conn-login your_username \
    --conn-password your_password

4. Docker Default Connection

Create/update the Docker connection that will be updated:

Via Airflow UI:

Go to Admin → Connections
Click the “+” button
Fill in:
- Connection Id: docker_default
- Connection Type: Docker
- Host: Your ECR registry URL (e.g., 123456789012.dkr.ecr.us-east-1.amazonaws.com)
- Login: AWS (this will be updated by the DAG)
- Password: (leave empty, will be updated by the DAG)

Via CLI:

airflow connections add docker_default \
    --conn-type docker \
    --conn-host 123456789012.dkr.ecr.us-east-1.amazonaws.com \
    --conn-login AWS

Deploy the DAG

Copy the DAG code to your Airflow DAGs folder
The DAG will appear in the Airflow UI as refresh_docker_token_v4

7. Configure Queues and Pools

The DAG uses a systemqueue pool. Create it:

Via Airflow UI:

Go to Admin → Pools
Create a pool named systemqueue with appropriate slots (e.g., 5)

Via CLI:

airflow pools set systemqueue 5 "System maintenance tasks"

DAG Configuration

Schedule

Cron: 55 5,17 * * * (runs at 5:55 AM and 5:55 PM daily)
Timezone: UTC (adjust as needed)

Key Settings

Max Active Runs: 1 (prevents overlapping executions)
Catchup: False (doesn’t backfill missed runs)
Retries: 2 with 1-minute delay
Tags: ["airflow", "docker", "ecr"]

Verify Docker daemon is running
Check that /var/run/docker.sock is accessible
Ensure the ECR registry URL is correct

"""
DAG to refresh Docker ECR authentication token
Updates the docker_default connection with fresh ECR credentials using JWT authentication

You should not have your own: ~/.docker/config.json

"""

import base64
import logging
from datetime import datetime, timedelta
from typing import Any, Dict

import boto3
import requests
from airflow.decorators import dag, task
from airflow.hooks.base import BaseHook
from airflow.models.variable import Variable
from airflow.providers.docker.hooks.docker import DockerHook

logger = logging.getLogger("ecr_docker_token_refresh")

ecr_aws_account = Variable.get("ecr_aws_account")
ecr_aws_region_name = Variable.get("ecr_aws_region_name")

default_args = {
    "retry_delay": timedelta(minutes=1),
    "depends_on_past": False,
    "retries": 2,
    "email_on_failure": False,
    "email_on_retry": False,
    "queue": "systemqueue",
    "pool": "systemqueue",
}

connection_id = "docker_default"
airflow_api_connection_id = "airflow-api"


def get_jwt_token(endpoint_url: str, username: str, password: str) -> str:
    """Get JWT token from Airflow API"""
    auth_url = f"{endpoint_url}/auth/token"
    payload = {"username": username, "password": password}
    headers = {"Content-Type": "application/json"}
    logger.info(f"Requesting JWT token from {auth_url}")

    response = requests.post(auth_url, json=payload, headers=headers)
    response.raise_for_status()

    token_data = response.json()
    access_token = token_data.get("access_token")

    if not access_token:
        raise ValueError("No access_token found in response")

    logger.info("Successfully obtained JWT token")
    return access_token


def update_connection_password_with_jwt(
    endpoint_url: str, jwt_token: str, password: str
) -> bool:
    """Update connection password using JWT token with v2 bulk API"""
    url = f"{endpoint_url}/api/v2/connections"

    # First, get the current connection to preserve other fields
    get_url = f"{endpoint_url}/api/v2/connections/{connection_id}"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {jwt_token}",
    }

    logger.info(f"Getting current connection {connection_id}")

    try:
        get_response = requests.get(get_url, headers=headers)
        get_response.raise_for_status()
        current_connection = get_response.json()

        logger.info(f"Current connection retrieved successfully")

        # Prepare bulk update payload using v2 API
        payload = {
            "actions": [
                {
                    "action": "update",
                    "entities": [
                        {
                            "connection_id": connection_id,
                            "conn_type": current_connection.get("conn_type", "docker"),
                            "password": password,  # This is what we're updating
                        }
                    ],
                    "action_on_non_existence": "fail",
                }
            ]
        }

        logger.info(f"Updating connection {connection_id} at {url}")

        response = requests.patch(url, json=payload, headers=headers)
        response.raise_for_status()

        response_data = response.json()
        logger.info(f"Bulk update response: {response_data}")

        # Check if update was successful
        update_results = response_data.get("update", {})
        success_count = len(update_results.get("success", []))
        error_count = len(update_results.get("errors", []))

        if success_count > 0 and error_count == 0:
            logger.info("Connection password updated successfully")
            return True
        else:
            logger.error(
                f"Update failed - Success: {success_count}, Errors: {error_count}"
            )
            if error_count > 0:
                logger.error(f"Errors: {update_results.get('errors', [])}")
            return False

    except requests.exceptions.RequestException as e:
        logger.error(f"Failed to update connection: {e}")
        if hasattr(e, "response") and e.response is not None:
            logger.error(f"Response status: {e.response.status_code}")
            logger.error(f"Response text: {e.response.text}")
        raise


@dag(
    default_args=default_args,
    schedule="55 5,17 * * *",
    start_date=datetime.now() - timedelta(days=1),
    max_active_runs=1,
    catchup=False,
    tags=["airflow", "docker", "ecr"],
    dag_id="refresh_docker_token_v4",
    description="Refresh Docker ECR token using JWT authentication",
)
def refresh_docker_token():
    @task(priority_weight=5, pool="systemqueue")
    def extract_ecr_token() -> Dict[str, Any]:
        """Extract ECR authorization token using boto3"""
        logger.info("Starting ECR token extraction")

        try:
            logger.info(f"Connecting to ECR in region {ecr_aws_region_name}")
            ecr_client = boto3.client("ecr", region_name=ecr_aws_region_name)

            logger.info(f"Requesting authorization token for account {ecr_aws_account}")
            response = ecr_client.get_authorization_token(registryIds=[ecr_aws_account])

            auth_data = response["authorizationData"][0]
            token = auth_data["authorizationToken"]
            registry_url = auth_data["proxyEndpoint"]
            expires_at = auth_data["expiresAt"]

            logger.info("Successfully retrieved token")
            logger.info(f"Registry URL: {registry_url}")
            logger.info(f"Token expires at: {expires_at}")

            decoded_token = base64.b64decode(token).decode()
            username, password = decoded_token.split(":", 1)

            logger.info(f"Decoded username: {username}")

            return {
                "registry_url": registry_url,
                "username": username,
                "password": password,
                "expires_at": expires_at.isoformat(),
                "raw_token": token,
            }

        except Exception as e:
            logger.error(f"Failed to extract ECR token: {str(e)}")
            raise

    @task(priority_weight=5, pool="systemqueue")
    def update_docker_connection(token_data: Dict[str, Any]) -> str:
        """Update Docker connection using JWT authentication"""
        logger.info("Starting Docker connection update using JWT authentication")
        logger.info("Token data received from previous task")

        try:
            # Get Airflow API connection details
            logger.info(
                f"Retrieving Airflow API connection: {airflow_api_connection_id}"
            )
            api_connection = BaseHook.get_connection(airflow_api_connection_id)

            endpoint_url = f"{api_connection.schema}://{api_connection.host}"
            if api_connection.port:
                endpoint_url += f":{api_connection.port}"

            username = api_connection.login
            password = api_connection.password

            logger.info(f"Using endpoint: {username} @ {endpoint_url}")
            jwt_token = get_jwt_token(endpoint_url, username, password)

            success = update_connection_password_with_jwt(
                endpoint_url, jwt_token, token_data["password"]
            )

            if success:
                return "SUCCESS: Docker connection updated successfully using JWT authentication"
            else:
                raise Exception("Failed to update connection")

        except Exception as e:
            logger.error(f"Failed to update Docker connection: {str(e)}")
            raise

    @task(priority_weight=3, pool="systemqueue")
    def test_docker_connection() -> str:
        """Test the updated Docker connection"""
        logger.info("Testing DockerHook...")

        try:
            # First get the connection details to debug
            connection = BaseHook.get_connection("docker_default")

            docker_hook = DockerHook(
                docker_conn_id="docker_default", base_url="unix://var/run/docker.sock"
            )
            logger.info("DockerHook created successfully")

            # Try to get docker client (this will test the connection more thoroughly)
            docker_client = docker_hook.get_conn()
            logger.info("Docker client connection established", docker_client.version())
            return "SUCCESS: Docker connection tested and working with DockerHook"

        except Exception as client_error:
            logger.error(f"Docker client test failed: {client_error}")

            # Show connection properties on failure
            try:
                connection = BaseHook.get_connection("docker_default")
 
            except Exception as conn_error:
                logger.error(f"Could not retrieve connection properties: {conn_error}")

            return f"FAILED: Docker connection test failed: {client_error}"

    # Task flow
    token_data = extract_ecr_token()
    update_result = update_docker_connection(token_data)
    test_result = test_docker_connection()

    token_data >> update_result >> test_result


refresh_docker_token_dag = refresh_docker_token()

Upgrading Postgresql

J December, 2024 / guda

It’s easy. I thought it would be hard.

First lets install the postgres tooling for ubuntu

sudo apt install -y postgresql-common

Then lets add and enable the PostgreSQL APT repository

sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh

Install the latest postgresql

apt install postgresql

see all clusters

pg_lsclusters
dpkg --get-selections | grep postgres

there will be new cluster. rename it to main_pristine

pg_renamecluster 17 main main_pristine

Choose which cluster to upgrade

sudo pg_upgradecluster 13 main

Because pg_upgradecluster comes from the new package it will magically creates a new version near the old version and you will have something like :

# pg_lsclusters
Ver Cluster       Port Status Owner    Data directory                       Log file
13  main          5432 online postgres /var/lib/postgresql/13/main          /var/log/postgresql/postgresql-13-main.log
17  main_pristine 5434 online postgres /var/lib/postgresql/17/main_pristine /var/log/postgresql/postgresql-17-main_pristine.log

# sudo pg_upgradecluster 13 main
.... a lot of upgrading

# pg_lsclusters
Ver Cluster       Port Status Owner    Data directory                       Log file
13  main          5435 down   postgres /var/lib/postgresql/13/main          /var/log/postgresql/postgresql-13-main.log
17  main          5432 online postgres /var/lib/postgresql/17/main          /var/log/postgresql/postgresql-17-main.log
17  main_pristine 5434 online postgres /var/lib/postgresql/17/main_pristine /var/log/postgresql/postgresql-17-main_pristine.log

Notice that the port and everything is in place! Nice!

Finally lets do some cleanup.

pg_dropcluster 13 main --stop
pg_dropcluster 17 main_pristine --stop

This article would be not possible without
https://gorails.com/guides/upgrading-postgresql-version-on-ubuntu-server
https://www.directedignorance.com/blog/upgrading-postgresql-14-to-16-on-ubuntu

Coding with relaxed eyes

J July, 2024 / guda

I know this article contains a lot of text, but trust me, it’s absolutely worth reading—you’ll become much more productive!

Fixing Incorrect File Paths

Use Case:
You encounter an incorrect file path and need to locate the issue. Instead of scrutinizing the path segment by segment, a more effective approach is to list the path and start trimming it from the end until you find the correct segment.

This method will save you mental effort and reduce eye strain.

Example:
You receive an error when attempting to open the following file:
/home/user/projects/pizza/seed/images/themes/pizza/01.jpg

To resolve the issue:

  ls -l /home/user/projects/pizza/seed/images/themes/pizza/
  ls -l /home/user/projects/pizza/seed/images/themes/
  ls -l /home/user/projects/pizza/seed/images/
  ls -l /home/user/projects/pizza/seed/
  ls -l /home/user/projects/pizza/

Then you can easily spot that the “seed” segment is not “seed” but “seeds”

final check

ls -l/home/user/projects/pizza/seeds/images/themes/pizza/01.jpg

It works!

Use finding instead of scrolling

If you need to locate something, use the “Find” shortcut instead of scrolling and reading. The “Find” command is much faster and allows you to search for variables, class names, or even partial names.

Goto Line Approximately

If your error is on line 459, you can quickly navigate there using a shortcut. Simply type a number close to 459, such as 450, and you’ll instantly see line 459 along with the numbered lines around it.

Use Code Folding

Use Case: To minimize distractions, use code folding to hide parts of the code you’re not currently working on.

Example:

In most code editors, you can collapse code blocks by clicking the small arrow next to the line numbers. This helps you focus on the part of the code you’re currently working on.

Comparing two branches

To compare two branches you can use git diff …branch-name but this will require a lot of effort.

Good way to deal with that is by cloning another copy of the repo and having two repositories locally.

Usually I have “project-name” repositor folder and “project-name-other” repository name.

Then when I want to compare I use some GUI to do the job. Mine is meld.

meld project-name/ project-name-other/

Change directory

Sometimes you want to change directory of a file. My way of doing that is to copy the current opened file path from the editor with a shortcut. Grab the whole file path in the clipboard and then do cdf … like that

cdf /home/user/some-project/some-folder/file_name.extension

Here is how cdf looks like

cdf() {
    if [ $# -eq 0 ]; then
        echo "No file path provided."
        return 1
    fi

    # Join all arguments with spaces
    local full_path="$*"


    if [ -f "$full_path" ]; then
        # If it's a file, extract the directory path
        local dir_path
        dir_path=$(dirname "$full_path")
    elif [ -d "$full_path" ]; then
        # If it's a directory, use it directly
        local dir_path="$full_path"
    else
        echo "The path provided is neither a file nor a directory."
        return 1
    fi


    # Change to the directory
    cd "$dir_path" || {
        echo "Failed to change directory to $dir_path"
        return 1
    }
}

Development environment for dbt

J June, 2024 / guda

How to setup dbt development environment
with vscode extension

1. Development environment for dbt

To be able to deploy fast in production and development, the configuration for the dbt should be set by environment variables.

This allow us to have quick switch between the profiles, schemas, variables and more..

Why we are doing today is to have an .envrc holding all the configuration for the dbt.

for example in development:

# Snowflake account
export SNOWFLAKE_ACCOUNT=YOURACCOUNT.us-east-1
export SNOWFLAKE_WAREHOUSE=YOURWAREHOUSE

export DBT_PROFILE=development
export DBT_ROLE=DEVELOPMENT

# Tenant / Used for the query tagging
export TENANT_NAME=CLIENT_NAME

# Source
export DBT_SOURCE_DATABASE=RAW_ZONE
export DBT_SOURCE_SCHEMA=${TENANT_NAME}

# Target
export DBT_TARGET_DATABASE=STANDARD_ZONE_DEVELOPMENT
export DBT_TARGET_SCHEMA=${TENANT_NAME}_${SNOWFLAKE_USER}

Note how we add the SNOWFLAKE_USER to the target schema, so that the users will not override their work in case they want to work on the same client.

To make this work this is what we have added profiles.yml

development:
  target: development
  outputs:
    development:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      warehouse: "{{ env_var('SNOWFLAKE_WAREHOUSE') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: "{{env_var('DBT_ROLE')}}"
      database: "{{ env_var('DBT_TARGET_DATABASE') }}"
      schema: "{{ env_var('DBT_TARGET_SCHEMA') }}"
      threads: 8


production:
  target: production
  outputs:
    production:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: "{{env_var('DBT_ROLE')}}"
      warehouse: "{{ env_var('SNOWFLAKE_WAREHOUSE') }}"
      database: "{{ env_var('DBT_TARGET_DATABASE') }}"
      schema: "{{ env_var('DBT_TARGET_SCHEMA') }}"
      threads: 8

and to glue everything you need to switch to environment variables in the dbt_project.yml. Note that this feature is is not supported on old dbt versions. In the documentation it is written that we can use environment variables in dbt_project.yml

dbt_project.yml

profile: "{{ env_var('DBT_PROFILE', 'development') }}"

vars:
  source_database: "{{ env_var('DBT_SOURCE_DATABASE') }}"
  source_schema: "{{ env_var('DBT_SOURCE_SCHEMA') }}"

So the final command will be:

  export DBT_PROFILES_DIR="."
  dbt --no-anonymous-usage-stats run

While using justfile:

dbt:
  #!/bin/bash
  echo "SOURCE: $DBT_SOURCE_DATABASE / $DBT_SOURCE_SCHEMA"
  echo "TARGET: $DBT_TARGET_DATABASE / $DBT_TARGET_SCHEMA"
  DBT_PROFILES_DIR="{{invocation_directory()}}"

  poetry run \
    dbt --no-anonymous-usage-stats run \
    --fail-fast

2. Setup of the “dbt Power User” vscode extension

Install the “dbt Power User” as usual.

Open the vscode prefferences, as json.

Point the current folder for the profilesDir like this:

"dbt.profilesDirOverride": "${workspaceFolder}",

Associate the sql files to be used as jinja templates.

  "files.associations": {
        "*.sql": "jinja-sql"
    },

And then make sure you are using the correct python environment.

Enjoy the dbt!

Daily links 2024-06-04

J June, 2024 / guda

eXtreme Go Horse :: #methodologies, #architecture
haha article https://medium.com/@noriller/sprints-the-biggest-mistake-of-software-engineering-34115e7de008

wireguard readings :: #wireguard
how to fix wireguard connection by changing mtu https://keremerkan.net/posts/wireguard-mtu-fixes/
collection of wireguard docs and tools – https://github.com/pirate/wireguard-docs

Cold starts and lampdas :: #lambda, #aws
https://aaronstuyvenberg.com/posts/understanding-proactive-initialization

Storage-First Patter :: #aws, #lambda
Store the request, then process it. https://dev.to/aws-builders/serverless-patterns-4439

https://cbannes.medium.com/decoupling-microservices-with-aws-eventbridge-pipes-3cef3a1dfce7

AWS Badges and certificates :: #aws
Those I think are free learning resources, and maybe obtaining the badge. https://aws.amazon.com/training/badges/

Refactoring :: #architecture
nice wesbite with a lot of information on refactoring including design patters: https://refactoring.guru/

CMS :: #api, #cms
CMS As API
https://decapcms.org/docs/i18n/

Free AI courses from Nvidia :: #ai, #learning
https://www.kdnuggets.com/free-ai-courses-from-nvidia-for-all-levels

Payed and free learning :: #learning, #javascript, #css
Nice CSS and Javascript and Figma Tutorials https://v2.scrimba.com/home

CSS for fast quick website :: #css
https://matcha.mizu.sh/#input

Learn CSS :: #css tips
https://css-tip.com/better-modern-css/

Alternative to screen :: #linux, #cli
zellij.dev

How to use payed models for free :: #ai
How to se Use payed AI for free https://www.kdnuggets.com/5-ways-to-access-gpt-4o-for-free

“Git Quick” VS Code Extension Review

J May, 2024 / guda

Introduction

The “Git Quick” extension for Visual Studio Code is designed to streamline your workflow with Git by providing instant commands for staging, committing, and restoring files directly from the editor. This review explores its main functionalities and benefits, as well as potential areas for future improvements.

Main Features

Instant Staging and Committing

One of the standout features of “Git Quick” is the git-quick-commit command. This command allows you to commit the file currently in focus with remarkable speed. Here’s how it works:

Automatic Staging: As soon as you invoke the command (from the command palette or shortcut), the extension stages the current file for you.
Prompt for Commit Message: You will then be prompted to enter a commit message, ensuring that your changes are documented appropriately – and most import with the right scope!
File Save: If the file has unsaved changes, Git Quick will automatically save it before proceeding with the commit.

This feature is particularly useful for developers who need to make frequent commits without losing focus on their current task.

Quick Restore

The git-quick-restore command is another powerful feature of Git Quick. It allows you to quickly revert the current file to its state at the most recent commit. This is equivalent to discarding all local changes made to the file:

Instant Revert: With a single command, you can undo any unwanted changes, making it a lifesaver during experimentation or bug fixing.
No Activation if Unchanged: The command will only activate if there are changes to the file, ensuring that you don’t accidentally revert unchanged files.

Additional Features

git-quick-checkout: This is an alias for the git-quick-restore command, providing flexibility in how you interact with the extension.
Multiple Repository Support: If you have multiple Git repositories open, Quick Git will automatically detect and apply the command to the appropriate repository.
Integration with VS Code Git Extension: It relies on the built-in Git functionality of VS Code, meaning there are no external dependencies to worry about.

User Experience

Quick Git enhances the Git workflow by minimizing interruptions and keeping you in your coding environment. The automatic saving of files and seamless integration with VS Code’s Git extension make it a natural part of the development process.

No Distractions

Non-Intrusive: The extension won’t activate if the current file hasn’t changed, which prevents unnecessary prompts and distractions.
Focus Retention: By allowing you to commit or restore files directly from the editor, it helps maintain your focus on coding rather than switching contexts to the terminal or another Git client.

Future Potential

The current feature set of Git Quick is already impressive, but the promise of additional “quick” commands in the future makes it an exciting tool to watch. Potential future enhancements could include:

Quick Branch Switching: Instantly switch branches without navigating through multiple menus.
Quick Merge/Rebase: Simplify complex Git operations to a single command.

Download link https://marketplace.visualstudio.com/items?itemName=gudasoft.git-quick

Conclusion

The Git Quick extension for VS Code is a highly efficient tool for developers looking to speed up their Git workflow. With instant staging, committing, and restoring capabilities, it reduces the friction of version control tasks and keeps you focused on coding. As it stands, it’s a valuable addition to any developer’s toolkit, with promising features on the horizon.

For more information and to download the extension, visit the Git Quit repository. Also, check out other great projects from Gudasoft!

Daily picks 2024-05-21

J May, 2024 / guda

How to Open Source python project :: #python
https://jonathanadly.com/open-sourcing-a-python-project-the-right-way-in-2024

Static analysis of python :: #python, #security
https://publications.waset.org/10013441/static-analysis-of-security-issues-of-the-python-packages-ecosystem

Snyk has limited free usage per month :: #python, #security
https://snyk.io/

Clean Architecture :: #svelte, #architecture
Good reading with resources and examples https://newsletter.techworld-with-milan.com/p/what-is-clean-architecture
It looks that chatgpt is good to show example project structure for hexagonal, clean and vertical design.
I used the following prompt “show sample file/folder structure for svelte application following the vertical design”
Verticle design architecture – https://www.jimmybogard.com/vertical-slice-architecture/https://www.jimmybogard.com/vertical-slice-architecture/
svelte example – https://github.com/tedesco8/svelte-clean-arquitecture

OData – Open Data Protocol :: #api, #REST
🔗 https://www.odata.org/
When talking REST there is a need for standard communication.
OData is a standardization of RESTful APIs, offering a uniform way to query and manipulate data. It provides a rich set of query capabilities directly in the URL, allowing for complex querying, filtering, sorting, and paging without additional
For example encoding the params the same way in the urls:
– $filter: Filter results are based on conditions (e.g., get customers with an email containing “@example.com”).
– $orderby: Sorts results based on specific properties (e.g., order products by price ascending).
– $select: Selects only specific properties of an entity (e.g., retrieve just name and category from products).
– $top and $skip: Limits the number of returned entities (e.g., get the first 10 customers or skip the first 20 and retrieve the rest).
– $expand: Retrieves related entities along with the main entity (e.g., get a customer with their associated orders).
There is nice [tutorial](https://www.odata.org/getting-started/basic-tutorial/)

Self hosted list with nice projects :: #self-hosted
blog with nice projects https://noted.lol/

Send money :: #crypto, #bitcoin
https://www.uma.me/

k6 :: #benchmark, #api, #data-listener
Nice and easy tool to do benchmark and load testing
https://k6.io/open-source/

Sikuli :: #testing
https://raiman.github.io/SikuliX1/downloads.html
https://github.com/glitchassassin/lackey

Daily picks 2024-05-20

J May, 2024 / guda

From today we will start to publish some interesting links found during our software development practice.

Lets start

Guy build his CV as a game. :: #tutorials, #game, #javascript
It is not user friendly, don’t do it. Nice tutorial
Game/CV at: https://jslegenddev.github.io/portfolio/
Video at: https://www.youtube.com/watch?v=wy_fSStEgMs
A lot of game tutorials at his youtube channel https://www.youtube.com/@jslegenddev

Golang youtube channel. :: #tutorials, #youtube, #golang
https://www.youtube.com/@MelkeyDev

Detective social game walk-trough. :: #youtube, #games
Fun to watch. Nice game review channel.

CUID2 vs UUID2. :: #web, #security
CUID2 – shorter, no collisions, hard to generate
Example CUID2: c00-v4-abcdefgh12345678
UUID – longer, collision, easy to generate

Fun codding problems from Easy to hard :: #tutorials, #interview
https://daily.dev/blog/fun-coding-problems-from-easy-to-hard

React porfolio website :: #javascript, #web-design
The guy stole a lot of elements and put a portfolio. There is nice discussion.
Discovered also this amazing website: https://wiscaksono.com/ and wiscaksono-site github, it uses the wakatime.com for the coding statistics.

reflow :: #web, #css
A reminder that reflow is a negative effect on performance.

Richardson Maturity Model :: #web, #api
A [maturity model](https://en.wikipedia.org/wiki/Richardson_Maturity_Model) for REST APIs
– level 0 – random urls,
– level 1 – resources
– level 2 – resources + verbs

Monitoring Airflow with Streamlit

J April, 2024 / guda

Having a lot of customers on Airflow and monitoring their data pipelines could become a problem.

There is no way to find which one of 100 pipelines didn’t run today or has been processed more than once.

If you want to know the average times and when all the client pipelines finished you have to dig into the airflow database.

Do you have retries of your tasks? How many?

How your dag executions compared to the one from last week?

For that reason I decided to bring a dashboard with the metrics which I missed.

starting a new project with python poetry is easy and stable. Here are dependencies for postgresql and snowflake

I will use the open source dashboarding tool from https://streamlit.io/

poetry init
poetry add streamlit snowflake-connector-python sqlalchemy

pyproject.toml:

[tool.poetry]
name = "dashboard"
version = "0.1.0"
description = ""
authors = ["gudata <i.bardarov@gmail.com>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.11"
streamlit = "^1.33.0"
psycopg2 = "^2.9.9"
snowflake-connector-python = "^3.9.0"
sqlalchemy = "^2.0.29"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Then we define our connections into .streamlit/secrets.toml

[connections.snowflake-admin]
account = "mysnowflakeaccount.us-east-1"
user = "mysnowflakeuser"
password = "thepassword"
database = "theTenantsDatabase"
schema = "theSchema"

[connections.airflow]
dialect = "psycopg2"
url = "postgresql://root:mypassword@myhost-us-east-1.rds.amazonaws.com/airflow_us_ea
st_1"

And we are almost there… Creating the and dashboard is like:

import streamlit as st

snowflake_connection = st.connection("snowflake-admin", type="snowflake")
postgres_connection = st.connection("airflow", type="sql")

Then querying and displaying a chart/table is very easy:

import pandas as pd

def all_dags():
    sql_query = """SELECT dag.dag_id
FROM dag
WHERE is_paused = false
    AND is_active = true
GROUP BY dag.dag_id
"""
    df = postgres_connection.query(sql_query)
    return df

def runs_per_dag(dag_id):
    sql_query = f"""SELECT dag_id,
        ROUND(
            (EXTRACT(EPOCH FROM (end_date - start_date)) / 60)
        , 0) as duration_in_minutes
    FROM dag_run
    WHERE 1=1
        AND state = 'success'
        AND external_trigger=false
        AND dag_id='{dag_id}'
    ORDER BY duration_in_minutes;
    """
    df = postgres_connection.query(sql_query)
    return df

dag_id = st.selectbox("Select one:", all_dags())

st.header(f"Dag times for {dag_id}", divider="rainbow")
st.dataframe(runs_per_dag(dag_id))

The result will be

Then you can continue adding elements to dashboard and finally deploy it with Docker where you want to be visible.

For the deploy I used shipit and it is not the shopify one. That one is 4 years old and is small & fast. You can start deploying now, not after 3 days learning curve. I have no idea why such a precious gem is not popular.

You need only one file

.shipit

host='airflow-us-east.example.com'
path='/home/ec2-user/dashboard'

[deploy:local]
echo "Deploying to: $path"
poetry export --without-hashes --format=requirements.txt > requirements.txt
rsync -ahzv --delete ./ $host:$path/
echo "Open http://192.168.1.4:8501/"


[deploy]
  docker build -t airflow_dashboard .
  docker stop airflow_dashboard || true
  docker rm airflow_dashboard || true
  docker run --name airflow_dashboard -d --env-file .env -p 8501:8501 airflow_dashboard

[status]
uptime

The full dashboard code is on GitHub, awaiting you with a lot of useful reports and charts.

summary of spotted problems
comparing snowflake(other) table with the dag runs for a specific date
failed tasks and displays the number of the retries
Detailed list of the tasks – are they running on the right pool and workers?
Times for the dags, avg, sum, min, max
Finish time of all dags, tasks per day
table view of single dag history

Theme by Anders Norén — Up ↑

Software development

Recent Posts

Categories

Authorguda

Overview

What This DAG Does

Prerequisites

AWS Setup

Airflow Environment

Step-by-Step Setup

Configure Airflow Variables

3. Create Airflow API Connection

4. Docker Default Connection

Deploy the DAG

7. Configure Queues and Pools

DAG Configuration

Schedule

Key Settings

Fixing Incorrect File Paths

Use finding instead of scrolling

Goto Line Approximately

Use Code Folding

Comparing two branches

Change directory

How to setup dbt development environment with vscode extension

1. Development environment for dbt

2. Setup of the “dbt Power User” vscode extension

Introduction

Main Features

Instant Staging and Committing

Quick Restore

Additional Features

User Experience

No Distractions

Future Potential

Conclusion

Google analytics

How to setup dbt development environment
with vscode extension