docs Loading last commit info...
sesshoseki
.gitignore
.pre-commit-config.yaml
.python-version
.ruff.toml
README.md
config.yml.example
pyproject.toml
pyrightconfig.json
schema.sql
uv.lock
README.md

Sesshoseki is an archival machine and job queue. Any number of computers can work together to fulfill extraction requests (or, "imports") submitted to a central PostgreSQL database.

You could think of it as "gallery-dl@Home." Make a heavy-duty scraper for yourself, or share with your friends!

Design and history

Sesshoseki is the latest iteration in a series of programs built for large art archival communities. New to this version of Sesshoseki is the notion of a base.

Any importer for some service may be in theory be represented as a simple procedure. In practice, we would like a consistent structure to guide our thinking. Initial versions of Sesshoseki introduced the concept of type descent to the yiff.party-like importer family. From root-level data, the central action of importers became a process of conceptual decomposition.

Old page from notes, during brainstorming for a "general method" of importing information from a target.

Or in other words, we "disarm" structures until we get what we want. It's roughly inspired by gallery-dl's internal message passing. The difference, however, is that gallery-dl more or less fundamentally operates on URLs, whereas Sesshoseki fundamentally operates on posts.

Newer page, where the importer is first gutted of service-specific functionality and assisted by another set of procedures. On the side, "extract-and-slurp."

This has now been elaborated on in a manner such that the extraction (interfacing with archival targets and conversion to internal structures) and storage of data are now separate. You may write base procedures to specify how the root input for different services can be converted, and test this functionality separately from the storage process. Importers on the other hand, have now become general-purpose. While one standard importer exists currently, different types are possible as the needs of gallery projects evolve.

As usual, a standard scraping and recording library is provided.

Quick start

Sesshoseki requires Uv, PostgreSQL 17 or greater, and Rclone. You also can bring Dragonfly (recommended) or another Rclone-compatible memory store for log sharing over the network. Chromium and a desktop environment may or not be required depending what bases you run.

You also need a package of bases to use.

mv config.yml.example config.yml  # open config.yml and configure
psql -f schema.sql -U honcho postgres  # prepare nekoschema
## copy bases to private
uv add --editable --no-workspace ./private
uv sync
uv run sesshoseki 0 10  # run worker zero with max ten concurrent tasks

Developing

uv sync --all-extras
uv run pre-commit install
# docs!
uv run sphinx-build -M html docs build
open build/html/index.html

Licensed under GNU Affero General Public License. tldr.

On Selenium-Driverless

# Selenium-Driverless is a library provided under the Attribution-
#   NonCommercial-ShareAlike 4.0 International license of the
#   Creative Commons Corporation, the source of which may be
#   viewed at these URLs:
#    https://creativecommons.org/licenses/by-nc-sa/4.0/
#    https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.en
# Explicit permission has been provided by the lead maintainer of and
#   majority contributor to this library as of 4/8/2024 for exceptions
#   allowing integration within Sesshoseki, a GNU Affero General Public
#   License v3 project, without adhering to the "share-alike" requirement,
#   and usage within Nekohouse on a certain commercial basis.
# If you wish to use Sesshoseki's integration with this library in a
#   commercial capacity, you may be required to ask the library's author
#   for permission to do by law.
# Thank you to Steve/Aurin Aegerte/`kaliiiiiiiiiii` for his excellent
#   work! ^_^
Please wait...
Page is in error, reload to recover