Compare commits
5 Commits
e18e94bb23
...
primary
Author | SHA1 | Date | |
---|---|---|---|
5cab66dfbb | |||
976370452e | |||
cf33dc92f3 | |||
5997bcb6f8 | |||
7f5806c0a9 |
11
.gitignore
vendored
11
.gitignore
vendored
@ -1,9 +1,12 @@
|
|||||||
*
|
*
|
||||||
!psi-alerts.sh
|
|
||||||
!psi-alerts@.service
|
|
||||||
!psi-monitor.service
|
|
||||||
!psi-by-example
|
|
||||||
!.gitignore
|
!.gitignore
|
||||||
!CONFIGURE.md
|
!CONFIGURE.md
|
||||||
!INSTALL.md
|
!INSTALL.md
|
||||||
!README.md
|
!README.md
|
||||||
|
!psi-alerts-user.service
|
||||||
|
!psi-alerts.sh
|
||||||
|
!psi-alerts@.service
|
||||||
|
!psi-by-example
|
||||||
|
!psi-monitor-user.service
|
||||||
|
!psi-monitor.service
|
||||||
|
!psi-monitor.sh
|
||||||
|
31
CONFIGURE.md
31
CONFIGURE.md
@ -1,19 +1,26 @@
|
|||||||
# CONFIGURE
|
# CONFIGURE Included in this project are a number of systemd units:
|
||||||
Included in this project are a number of systemd units:
|
|
||||||
* psi-monitor.service
|
* psi-monitor.service
|
||||||
* uses psi-monitor executable (in /usr/bin/)
|
* uses psi-monitor executable (in /usr/bin/)
|
||||||
* psi-alerts@.service (system template service)
|
* psi-alerts@.service (systemd template service)
|
||||||
* uses psi-alerts.sh script
|
* uses psi-alerts.sh script in */usr/local/bin/*
|
||||||
|
* psi-alerts-user.service (systemd user service)
|
||||||
|
* also uses psi-alerts.sh script in *~/bin/* (or wherever you want to
|
||||||
|
put it)
|
||||||
|
|
||||||
The `psi-alerts.sh` is essentially a daemon (a systemd simple service), and for
|
The `psi-alerts.sh` is essentially a daemon (a systemd simple service), and for
|
||||||
now the systemd template needs to be instantiated with the username that will
|
now the systemd template needs to be instantiated with the username that will
|
||||||
execute `psi-alerts.sh`. Also, a systemd unit override should be created, like
|
execute `psi-alerts.sh` (if using the systemd template). Also, a systemd unit
|
||||||
so:
|
override should be created, like so:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
sudo cp psi-alerts@.service /etc/systemd/system/
|
||||||
sudo systemctl edit psi-alerts@<user>.service
|
sudo systemctl edit psi-alerts@<user>.service
|
||||||
```
|
```
|
||||||
|
--OR--
|
||||||
|
```
|
||||||
|
cp psi-alerts-user.service ~/.config/systemd/user/psi-alerts.service
|
||||||
|
systemctl --user edit psi-alerts.service
|
||||||
|
```
|
||||||
This will open an editor, and in later versions of systemd the comment code will be included, clearly showing where the override should be entered:
|
This will open an editor, and in later versions of systemd the comment code will be included, clearly showing where the override should be entered:
|
||||||
|
|
||||||
```
|
```
|
||||||
@ -32,17 +39,21 @@ Environment=SSH_HOST="localhost"
|
|||||||
Environment=SSH_PORT=5999
|
Environment=SSH_PORT=5999
|
||||||
Environment=SSH_ID_PATH="~user/.ssh/psi-alerts"
|
Environment=SSH_ID_PATH="~user/.ssh/psi-alerts"
|
||||||
Environment=CLEAR_THRESHOLD="5.0"
|
Environment=CLEAR_THRESHOLD="5.0"
|
||||||
|
ExecStart= # Clear ExecStart for user unit
|
||||||
|
ExecStart=/path/to/psi-alerts.sh --user # User unit
|
||||||
|
|
||||||
### Edits below this comment will be discarded
|
### Edits below this comment will be discarded
|
||||||
|
|
||||||
### /etc/systemd/system/psi-alerts@.service
|
### /etc/systemd/system/psi-alerts@.service
|
||||||
# [Unit]
|
# [Unit]
|
||||||
# Description=Pressure Stall Information (PSI) alerts
|
# Description=Pressure Stall Information (PSI) alerts
|
||||||
# PartOf=multi-user.target
|
# PartOf=multi-user.target # system template
|
||||||
|
# PartOf=default.target # user service
|
||||||
# After=psi-monitor.service
|
# After=psi-monitor.service
|
||||||
#
|
#
|
||||||
# [Service]
|
# [Service]
|
||||||
# User=%i
|
#
|
||||||
|
# User=%i # User unit will not have User=%i
|
||||||
# Type=simple
|
# Type=simple
|
||||||
# ExecStart=psi-alerts.sh
|
# ExecStart=psi-alerts.sh
|
||||||
#
|
#
|
||||||
@ -85,5 +96,5 @@ All of these are required except where noted, there are no default options
|
|||||||
(SMS and email will still work, as they don't use SSH)
|
(SMS and email will still work, as they don't use SSH)
|
||||||
* **CLEAR_THRESHOLD**: The percentage threshold the some avg300 threshold
|
* **CLEAR_THRESHOLD**: The percentage threshold the some avg300 threshold
|
||||||
should be below before considering the alert cleared. This will depend
|
should be below before considering the alert cleared. This will depend
|
||||||
highly on the workload running on
|
highly on the workload running on the system.
|
||||||
|
|
||||||
|
56
INSTALL.md
56
INSTALL.md
@ -4,4 +4,60 @@ First, clone this repository with the `--recurse-submodules` flag:
|
|||||||
$ git clone --recurse-submodules https://git.eldon.me/trey/psi-alerts.git
|
$ git clone --recurse-submodules https://git.eldon.me/trey/psi-alerts.git
|
||||||
```
|
```
|
||||||
|
|
||||||
|
`--recurse-submodules` is only necessary if you wish to use the modified
|
||||||
|
psi-by-example program for `psi-monitor`. I found this too noisy to be of use,
|
||||||
|
it alerts too quickly so I wrote my own with relaxed timing.
|
||||||
|
|
||||||
|
If you want to use the psi-by-example/psi-monitor code, you'll need to compile
|
||||||
|
it:
|
||||||
|
|
||||||
|
```
|
||||||
|
gcc -o psi-monitor psi-monitor.c
|
||||||
|
```
|
||||||
|
|
||||||
|
## Using the systemd template unit
|
||||||
|
1. Copy the `psi-alerts.sh` and `psi-monitor.sh` scripts to */usr/local/bin*:
|
||||||
|
|
||||||
|
```
|
||||||
|
sudo cp psi-alerts.sh /usr/local/bin
|
||||||
|
sudo cp psi-monitor.sh /usr/local/bin/psi-monitor
|
||||||
|
### OR ###
|
||||||
|
sudo cp psi-by-example/psi-monitor /usr/local/bin
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Copy the systemd units to */etc/systemd/system*:
|
||||||
|
|
||||||
|
```
|
||||||
|
sudo cp psi-alerts@.service psi-monitor.service /etc/systemd/system/
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Using the systemd user units
|
||||||
|
1. Copy the `psi-alerts.sh` and `psi-monitor.sh` scripts to *~/bin* (or
|
||||||
|
wherever you want them):
|
||||||
|
|
||||||
|
```
|
||||||
|
cp -a psi-alerts.sh psi-monitor.sh ~/bin/
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Copy the systemd user units to *~/.config/systemd/user/*
|
||||||
|
|
||||||
|
```
|
||||||
|
cp psi-alerts-user.service ~/.config/systemd/user/psi-alerts.service
|
||||||
|
cp psi-monitor-user.service ~/.config/systemd/user/psi-monitor.service
|
||||||
|
```
|
||||||
|
|
||||||
|
# CONFIGURE
|
||||||
|
See *CONFIGURE.md* in this repository
|
||||||
|
|
||||||
|
# ENABLE and START
|
||||||
|
## system template instance:
|
||||||
|
```
|
||||||
|
sudo systemctl enable --now psi-monitor.service psi-alerts@<user>.service
|
||||||
|
```
|
||||||
|
|
||||||
|
## User instance
|
||||||
|
```
|
||||||
|
systemctl --user enable --now psi-monitor.service psi-alerts.service
|
||||||
|
```
|
||||||
|
|
||||||
|
12
psi-alerts-user.service
Normal file
12
psi-alerts-user.service
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Pressure Stall Information (PSI) alerts
|
||||||
|
PartOf=default.target
|
||||||
|
After=psi-monitor.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
ExecStart=psi-alerts.sh
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=default.target
|
||||||
|
|
@ -56,6 +56,14 @@ notification_cmd="${NOTIFICATION_CMD}"
|
|||||||
notification_hist_cmd="${NOTIFICATION_HIST_CMD}"
|
notification_hist_cmd="${NOTIFICATION_HIST_CMD}"
|
||||||
notification_opts="${NOTIFICATION_OPTS}"
|
notification_opts="${NOTIFICATION_OPTS}"
|
||||||
id_idx="${NOTIFICATION_IDX}"
|
id_idx="${NOTIFICATION_IDX}"
|
||||||
|
user=false
|
||||||
|
|
||||||
|
if [[ -n "${1}" ]]; then
|
||||||
|
if [[ "${1}" == "-u" ]] || \
|
||||||
|
[[ "${1}" == "--user" ]]; then
|
||||||
|
user=true
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
get_ssh_agent () {
|
get_ssh_agent () {
|
||||||
for dir in /tmp/ssh-*; do
|
for dir in /tmp/ssh-*; do
|
||||||
@ -89,7 +97,7 @@ print_stats () {
|
|||||||
pidstat -ul --human
|
pidstat -ul --human
|
||||||
;;
|
;;
|
||||||
IO)
|
IO)
|
||||||
sudo iotop --batch --only --iter=1
|
sudo iotop --batch --only --iter=10
|
||||||
printf "\n\n"
|
printf "\n\n"
|
||||||
pidstat -dl --human
|
pidstat -dl --human
|
||||||
;;
|
;;
|
||||||
@ -132,7 +140,7 @@ send_notice () {
|
|||||||
print "Connection to notification daemon failed!" >&2
|
print "Connection to notification daemon failed!" >&2
|
||||||
false
|
false
|
||||||
else
|
else
|
||||||
echo ${notification_id}
|
print ${notification_id}
|
||||||
true
|
true
|
||||||
fi
|
fi
|
||||||
elif [[ -n "${ssh_id_path}" ]]; then
|
elif [[ -n "${ssh_id_path}" ]]; then
|
||||||
@ -141,11 +149,11 @@ send_notice () {
|
|||||||
print "Connection to notification daemon failed!" >&2
|
print "Connection to notification daemon failed!" >&2
|
||||||
false
|
false
|
||||||
else
|
else
|
||||||
echo ${notification_id}
|
print ${notification_id}
|
||||||
true
|
true
|
||||||
fi
|
fi
|
||||||
else
|
else
|
||||||
echo "No SSH notifications configured. Returning." >&2
|
print "No SSH notifications configured. Returning." >&2
|
||||||
false
|
false
|
||||||
fi
|
fi
|
||||||
#set +x
|
#set +x
|
||||||
@ -154,7 +162,7 @@ send_notice () {
|
|||||||
send () {
|
send () {
|
||||||
#set -x
|
#set -x
|
||||||
if [[ "${#@}" -lt 2 ]] && [[ "${#@}" -gt 3 ]]; then
|
if [[ "${#@}" -lt 2 ]] && [[ "${#@}" -gt 3 ]]; then
|
||||||
echo "Wrong number of arguments to send()!" >&2
|
print "Wrong number of arguments to send()!" >&2
|
||||||
return false
|
return false
|
||||||
fi
|
fi
|
||||||
|
|
||||||
@ -236,7 +244,7 @@ exec_notices () {
|
|||||||
send "${psi_type}" "${current_alarms}" "${email_to}"
|
send "${psi_type}" "${current_alarms}" "${email_to}"
|
||||||
;;
|
;;
|
||||||
*)
|
*)
|
||||||
echo "Something went wrong!" >&2
|
print "Something went wrong!" >&2
|
||||||
false
|
false
|
||||||
;;
|
;;
|
||||||
esac
|
esac
|
||||||
@ -257,7 +265,7 @@ check_dunst_id_is_visible () {
|
|||||||
"${notification_hist_cmd} | jq '.data[0][].id.data'"); then
|
"${notification_hist_cmd} | jq '.data[0][].id.data'"); then
|
||||||
if ! ids=$(ssh -qi "${ssh_id_path}" -p ${ssh_port} -l "${ssh_user}" \
|
if ! ids=$(ssh -qi "${ssh_id_path}" -p ${ssh_port} -l "${ssh_user}" \
|
||||||
"${ssh_host}" "${notification_hist_cmd} | jq '.data[0][].id.data'"); then
|
"${ssh_host}" "${notification_hist_cmd} | jq '.data[0][].id.data'"); then
|
||||||
echo "Connection to dunst failed!" >&2
|
print "Connection to dunst failed!" >&2
|
||||||
return 2
|
return 2
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
@ -278,10 +286,14 @@ local last_line=""
|
|||||||
|
|
||||||
#set -x
|
#set -x
|
||||||
while true; do
|
while true; do
|
||||||
local line=$(journalctl -u ${svc} -n1)
|
if ${user}; then
|
||||||
local now=$(date +%s)
|
line=$(journalctl --user -u ${svc} -n1)
|
||||||
local last_timestamp=$(date -d "$(awk '{print $1" "$2" "$3}' <<< "${line}")" +%s)
|
else
|
||||||
local time_diff=$(( now - last_timestamp ))
|
line=$(journalctl -u ${svc} -n1)
|
||||||
|
fi
|
||||||
|
now=$(date +%s)
|
||||||
|
last_timestamp=$(date -d "$(awk '{print $1" "$2" "$3}' <<< "${line}")" +%s)
|
||||||
|
time_diff=$(( now - last_timestamp ))
|
||||||
if [[ "${last_line}" == "${line}" ]]; then
|
if [[ "${last_line}" == "${line}" ]]; then
|
||||||
# last line hasn't changed, check to see if we can clear alarms
|
# last line hasn't changed, check to see if we can clear alarms
|
||||||
if (( time_diff >= 3 )); then
|
if (( time_diff >= 3 )); then
|
||||||
@ -291,7 +303,7 @@ while true; do
|
|||||||
for alarm in ${alarms}; do
|
for alarm in ${alarms}; do
|
||||||
integer elapsed=$(( now - ${secs[${alarm}]} ))
|
integer elapsed=$(( now - ${secs[${alarm}]} ))
|
||||||
if is_clear "${alarm}" && (( elapsed >= 300 )); then
|
if is_clear "${alarm}" && (( elapsed >= 300 )); then
|
||||||
current_alarms=$(sed -E "s/${alarm}\|?//" <<< "${current_alarms}")
|
current_alarms=$(sed -E "s/${alarm}\|?//; s/|$//" <<< "${current_alarms}")
|
||||||
unset "notice_sent[${alarm}]"
|
unset "notice_sent[${alarm}]"
|
||||||
unset "secs[${alarm}]"
|
unset "secs[${alarm}]"
|
||||||
fi
|
fi
|
||||||
|
Submodule psi-by-example updated: 08c476910f...e09aacd35f
10
psi-monitor-user.service
Normal file
10
psi-monitor-user.service
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Pressure Stall Information (PSI) Monitor
|
||||||
|
PartOf=default.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
ExecStart=/home/trey/bin/psi-monitor.sh 80
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=default.target
|
62
psi-monitor.sh
Executable file
62
psi-monitor.sh
Executable file
@ -0,0 +1,62 @@
|
|||||||
|
#!/usr/bin/env zsh
|
||||||
|
#
|
||||||
|
# Pressure Stall Information monitor
|
||||||
|
#
|
||||||
|
# Copyright © 2023 Trey Blancher $(base64 -d <<< dHJleUBibGFuY2hlci5uZXQK)
|
||||||
|
#
|
||||||
|
# This program is free software: you can redistribute it and/or modify it
|
||||||
|
# under the terms of the GNU General Public License as published by the Free
|
||||||
|
# Software Foundation, either version 3 of the License, or (at your option)
|
||||||
|
# any later version.
|
||||||
|
#
|
||||||
|
# This program is distributed in the hope that it will be useful, but
|
||||||
|
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
|
||||||
|
# or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
||||||
|
# for more details.
|
||||||
|
#
|
||||||
|
# You should have received a copy of the GNU General Public License along
|
||||||
|
# with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
#
|
||||||
|
# Submodules may be distributed under a separate software license; see the
|
||||||
|
# LICENSE file within each submodule.
|
||||||
|
#
|
||||||
|
# This script monitors the three pressure stall information files
|
||||||
|
# /proc/pressure{cpu,io,memory} and reports if any resource is above threshold
|
||||||
|
# for the "some" values. It takes an optional single argument, the threshold at
|
||||||
|
# which to alert. If this is not supplied, it defaults to a threshold of 30.0
|
||||||
|
# percent.
|
||||||
|
#
|
||||||
|
|
||||||
|
local cpu="/proc/pressure/cpu"
|
||||||
|
local cpu_ctr=0
|
||||||
|
local io="/proc/pressure/io"
|
||||||
|
local io_ctr=0
|
||||||
|
local mem="/proc/pressure/memory"
|
||||||
|
local mem_ctr=0
|
||||||
|
local threshold=30.0
|
||||||
|
|
||||||
|
if [[ -n "${1}" ]]; then
|
||||||
|
threshold=${1}
|
||||||
|
fi
|
||||||
|
|
||||||
|
# main loop
|
||||||
|
while true; do
|
||||||
|
local cpu_pct=$(grep 'some' ${cpu} | awk '{print $2}' | awk -F'=' '{print $2}')
|
||||||
|
local io_pct=$(grep 'some' ${io} | awk '{print $2}' | awk -F'=' '{print $2}')
|
||||||
|
local mem_pct=$(grep 'some' ${mem} | awk '{print $2}' | awk -F'=' '{print $2}')
|
||||||
|
|
||||||
|
if (( cpu_pct > threshold )); then
|
||||||
|
cpu_ctr=$(( ${cpu_ctr} + 1 ))
|
||||||
|
printf "CPU PSI event %d triggered.\n" ${cpu_ctr}
|
||||||
|
fi
|
||||||
|
if (( io_pct > threshold )); then
|
||||||
|
io_ctr=$(( ${io_ctr} + 1 ))
|
||||||
|
printf "IO PSI event %d triggered.\n" ${io_ctr}
|
||||||
|
fi
|
||||||
|
if (( mem_pct > threshold )); then
|
||||||
|
mem_ctr=$(( ${mem_ctr} + 1 ))
|
||||||
|
printf "MEM PSI event %d triggered.\n" ${mem_ctr}
|
||||||
|
fi
|
||||||
|
|
||||||
|
sleep 10
|
||||||
|
done
|
Reference in New Issue
Block a user