urlwatch-jobs - Man Page

Job types and configuration for urlwatch

Synopsis

urlwatch --edit

Description

Jobs are the kind of things that urlwatch(1) can monitor.

The list of jobs to run are contained in the configuration file urls.yaml, accessed with the command urlwatch --edit, each separated by a line containing only ---. The command urlwatch --list prints the name of each job, along with its index number (1, 2, 3, ...) which gets assigned automatically according to its position in the configuration file.

While optional, it is recommended that each job starts with a name entry:

name: "This is a human-readable name/label of the job"

The following job types are available:

URL

This is the main job type -- it retrieves a document from a web server:

name: "urlwatch homepage"
url: "https://thp.io/2008/urlwatch/"

Required keys:

Job-specific optional keys:

(Note: url implies kind: url)

Browser

This job type is a resource-intensive variant of "URL" to handle web pages that require JavaScript to render the content being monitored.

The optional playwright package must be installed in order to run Browser jobs (see Dependencies). You will also need to install the browsers using playwright install (see Playwright Installation <https://playwright.dev/python/docs/intro> for details).

name: "A page with JavaScript"
navigate: "https://example.org/"

Required keys:

Job-specific optional keys:

Because this job uses Playwright <https://playwright.dev/python/> to render the page in a headless browser instance, it uses massively more resources than a "URL" job. Use it only on pages where url does not return the correct results. In many cases, instead of using a "Browser" job, you can use the output of an API called by the page as it loads, which contains the information you are you're looking for by using the much faster "URL" job type.

(Note: navigate implies kind: browser)

Shell

This job type allows you to watch the output of arbitrary shell commands, which is useful for e.g. monitoring an FTP uploader folder, output of scripts that query external devices (RPi GPIO), etc...

name: "What is in my Home Directory?"
command: "ls -al ~"

Required keys:

Job-specific optional keys:

(Note: command implies kind: shell)

Configuring stderr behavior for shell jobs

By default urlwatch captures stderr for error reporting (non-zero exit code), but ignores the output when the shell job exits with exit code 0.

This behavior can be customized using the stderr key:

  • ignore: Capture stderr, report on non-zero exit code, ignore otherwise (default)
  • urlwatch: stderr of the shell job is sent to stderr of the urlwatch process; any error message on stderr will not be visible in the error message from the reporter (legacy default behavior of urlwatch 2.24 and older)
  • fail: Treat the job as failed if there is any output on stderr, even with exit status 0
  • stdout: Merge stderr output into stdout, which means stderr output is also considered for the change detection/diff part of urlwatch (this is similar to 2>&1 in a shell)

For example, this job definition will make the job appear as failed, even though the script exits with exit code 0:

command: |
  echo "Normal standard output."
  echo "Something goes to stderr, which makes this job fail." 1>&2
  exit 0
stderr: fail

On the other hand, if you want to diff both stdout and stderr of the job, use this:

command: |
  echo "An important line on stdout."
  echo "Another important line on stderr." 1>&2
stderr: stdout

Optional Keys for All Job Types

Setting Keys for All Jobs at Once

The main Configuration file has a job_defaults key that can be used to configure keys for all jobs at once.

See urlwatch-config(5) for how to configure job defaults.

Examples

See urlwatch-cookbook(7) for example job configurations.

Files

$XDG_CONFIG_HOME/urlwatch/urls.yaml

See Also

urlwatch(1), urlwatch-intro(5), urlwatch-filters(5)

Referenced By

urlwatch(1), urlwatch-cookbook(7), urlwatch-filters(5), urlwatch-intro(7).

May 03, 2023 urlwatch