Configuration & Setup

Default settings are rarely optimal. Proper configuration maximizes efficiency, quality, and privacy.

Configuring Downloaders

Create a central configuration file at /home/user/.config/yt-dlp/config to set default options for every run.

/home/user/.config/yt-dlp/config
# Example /home/user/.config/yt-dlp/config
# Prefer best video with av1/vp9 codec and best audio, merge into mkv
-f 'bv[vcodec~=^av1|vp9]+ba'
--merge-output-format mkv

# Save to a specific directory
-o '/home/user/Videos/Archive/%(uploader)s/%(upload_date)s - %(title)s [%(id)s].%(ext)s'

# Embed metadata, thumbnail, and subtitles
--embed-metadata
--embed-thumbnail
--write-sub --embed-sub

# Use aria2 for accelerated downloads
--downloader aria2c
--downloader-args aria2c:'-x 16 -s 16 -k 1M'

# Keep a record of downloaded files to avoid duplicates
--download-archive /home/user/.config/yt-dlp/downloaded.txt

Optimizing Web Crawlers

When mirroring a site, always be a "polite" archivist. Rate-limit requests and identify yourself.

Polite wget crawling
wget \
    --recursive \
    --no-parent \
    --convert-links \
    --page-requisites \
    --adjust-extension \
    --user-agent "MyPersonalArchiveBot/1.0 (+http://my-contact-info.com)" \
    --wait=2 \
    --random-wait \
    --limit-rate=200k \
    example.com

Knowledge Management Setup

The most important step is choosing where to store your vault. Place it within your main archive directory on a dedicated drive, included in your backup routine.

Vault location
# Recommended vault location
/mnt/archive/05_knowledge/obsidian_vault/

# Or for Logseq
/mnt/archive/05_knowledge/logseq_graph/

Explore community plugins to tailor the application: Kanban boards, calendar integration, citation managers.

Privacy & Security Configuration

Enable the Tor service to start the SOCKS proxy on localhost:9050.

Tor configuration
# Enable and start Tor service
sudo systemctl enable --now tor.service

# Use with curl
curl --socks5-hostname localhost:9050 https://check.torproject.org/api/ip

# Use torsocks to wrap any command
torsocks wget https://example.com/file.zip