Authorguda

Notify when some processes are finished

I have a case where I have to process thousands of files.

I have used the parallels program to run in batches but I don’t want to monitor and see when the process will be finished.

Here is what I used to see if there are some processes named “convert”

#!/bin/bash

number=`ps aux | grep convert | wc -l`
echo $number
if [ "$number" -eq "1" ]; then
	telegram-send "finished converting"
	sleep 60
fi

Then run this in some “screen”

watch -n 60 ./notify.sh

That way you will get a message on telegram every 60 seconds.

telegram-send can be installed with pip install telegram-send

Watch Youtube on the Background

Youtube app is very rude and does not allow you to watch videos in the background.

There was a nice solution with shortcuts on IOS but it occasionally stops working.

There are also a lot of bots in Telegram which can help you download videos, but none of those is easy to use and offers you to save bandwidth by choosing the quality/size of the stream.

That’s why I wrote a small golang telegram bot to help me.

Here is how it works.

Install those apps on your apple mobile

  • VLC – free open source app
  • Telegram – a popular chat program
  • Youtube app

Open Telegram and add/find the Bot with the name “Audio Helper (Youtube)”.

When you paste a link the audio helper bot will ask you about the quality of the audio and will ask offer you to open the link or to use “more options”.

Click “more options” and you can choose between “download in VLC” or “stream in VLC”

Next time you want to watch some video on a locked screen you can share the clip directly into the telegram bot.

Enjoy watching youtube videos on a locked screen.

Creating AWS enabled local spark

Install pyspark

We need to choose the spark version. it could be 2.4 or bigger. In our case it is 2.4.6.

The installation method is with conda:

conda install -c conda-forge pyspark=2.4.6

Install java

We need to have java. The right version for java. There is a problem with java 272 which comes with Amazon Linux 2. So we have to first remove that version and install the older version.

Query for the current installed openjdk:

rpm -qa | grep java
..you will see something like
java-1.8.0-openjdk-headless-1.8.0.272.b10-1.amzn2.0.1.x86_64
java-1.8.0-openjdk.x86_64 1:1.8.0.272.b10-1.amzn2.0.1
...then remove by
yum remove jdk1.8

Going for Java 265

yum -v list java-1.8.0-openjdk-headless  --show-duplicates
yum -v list java-1.8.0-openjdk  --show-duplicates
...
yum install java-1.8.0-openjdk-1.8.0.265.b01-1.amzn2.0.1
.. headless will be installed by the upper command.

Update alternatives

alternatives --config java

AWS Enable Local Spark

Check what version of hadoom-common you have

ls -l /opt/anaconda3/envs/advanced/lib/python2.7/site-packages/pyspark/jars/hadoop*
....
hadoop-common-2.7.3.jar
...

That means that we have to stick to aws sdk for hadoop 2.7.3 Download hadoop-aws-2.7.3.jar and its dependency aws-java-sdk-1.7.4.jar. Great tutorial found here

So the final code to get the spark running is

def create_local_spark():
    jars = [
        "/opt/jars/hadoop-lzo-0.4.21-SNAPSHOT.jar",
        "/opt/jars/aopalliance-1.0.jar",
        "/opt/jars/bcprov-jdk15on-1.51.jar",
        "/opt/jars/ion-java-1.0.2.jar",
        "/opt/jars/jcl-over-slf4j-1.7.21.jar",
        "/opt/jars/slf4j-api-1.7.21.jar",
        "/opt/jars/bcpkix-jdk15on-1.51.jar",
        "/opt/jars/emrfs-hadoop-assembly-2.19.0.jar",
        "/opt/jars/javax.inject-1.jar",
        "/opt/jars/jmespath-java-1.11.129.jar",
        "/opt/jars/s3-dist-cp-2.7.0.jar",
        "/opt/jars/s3-dist-cp.jar",
        "/opt/jars/mysql-connector-java-5.1.39.jar",
    ]

    aws_1 = [
        "/opt/jars/hadoop-aws-2.7.3.jar",
        "/opt/jars/aws-java-sdk-1.7.4.jar",
    ]

    jars_string = ",".join(jars + aws_1)
    pyspark_shell = "--jars {} --driver-memory 4G pyspark-shell".format(jars_string)

    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_shell
    os.environ["PYSPARK_PYTHON"] = "/opt/anaconda3/envs/advanced/bin/python"

    spark_session = SparkSession.builder.appName("ZZZZZ").getOrCreate()
    hadoop_conf = spark_session._jsc.hadoopConfiguration()

    hadoop_conf.set("com.amazonaws.services.s3.enableV4", "true")
    hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    hadoop_conf.set("fs.s3a.server-side-encryption-algorithm", "AES256")
    hadoop_conf.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.InstanceProfileCredentialsProvider,com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
    hadoop_conf.set("fs.AbstractFileSystem.s3a.impl", "org.apache.hadoop.fs.s3a.S3A")

    spark_context = spark_session.sparkContext
    sql_context = SQLContext(spark_context)
    # df = spark_session.read.json("s3a://hello/world/")
    return spark_context, sql_context

Bookmarks January 2021

Continue reading

Backup whole hard disk with squashfs

A great blog post showing how to backup an ssd with squashfs and dd.
Here is a pdf version

Rails 6.1 and Webpack

Continue reading

Python meta classes

Continue reading

Know when and who is doing ssh

I would like to know if someone is using the ssh on my servers. That’s why I have put a telegram notification. Here is how it works

Put in ssh/sshrc

#!/bin/bash

telegram-send -g "Access $SSH_TTY $SSH_CONNECTION `id`" &

Of course you need to setup your telegram-send

Compare two filesystems

Lets run those command on the machines

New instanceOld instance
find / -xdev | sort > new.txt find / -xdev | sort > old.txt

Pull the files locally

scp -i ~/.ssh/somekey ec2-user@10.1.22.1:/new.txt  /tmp/new.txt
scp -i ~/.ssh/somekey ec2-user@10.1.19.1:/old.txt  /tmp/old.txt

Then use this great delta tool to compare the files

delta -s /tmp/new.txt /tmp/old.txt

Using md5 sums

And a using md5 sums – this is slow!

# On the new instance

find / -xdev -type f -exec  md5sum {} \; > new-files.txt
find / -xdev -type d | sort > sorted.new-folders.txt
sort  -k2  new-files >sorted.new-files.txt 

# On the old instance

find / -xdev -type f -exec  md5sum {} \; > old-files.txt
find / -xdev -type d | sort > sorted.old-folders.txt
sort  -k2  old-files >sorted.old-files.txt 

How to pull files from remote sftp

Create a script with the commands to pull the files and the remove them from the remote server

get -r upload/* incoming/
rm upload/*

You will need a cron

0 5 * * * /usr/bin/sftp -b batchfile.sh user@sftp.example.com

I recommend using systemd so that you can have logs

© 2024 Gudasoft

Theme by Anders NorénUp ↑