Skip to content
goodread

v0.1.0

The first public release of goodread: the full command surface, the goodread library, open-endpoint routing, and the crawl pipeline.

The first public release. goodread is a single pure-Go binary that turns public Goodreads pages into structured records: look up a book, an author, a series, a list, a genre, a user, or a quote, search the catalog, read a shelf, and crawl in bulk. It talks to www.goodreads.com over plain HTTPS with no API key, so there is nothing to sign up for and nothing to pay for.

What you get

  • Search the catalog. goodread search queries the open autocomplete endpoint for books and authors, with --books for rich book records and --html for the full search page.
  • Look up records. book, author, series, list, genre, user, and quote each take an id or a URL and return a structured record, JSON-LD first with an HTML-selector fallback.
  • Read shelves. goodread shelf reads a reader's bookshelf from the public RSS feed by default, with --html and --max-pages to walk the paginated shelf when you need more.
  • Find related work. similar and reviews read what a book page links to, and id classifies a URL into (entity, id) without fetching.
  • Crawl in bulk. seed discovers URLs from the sitemap, crawl drains the queue into a local SQLite store, and db inspects and exports what you collected. cache manages the on-disk page cache.

Open-endpoint routing

Goodreads sits behind an AWS WAF that intermittently challenges some HTML pages. goodread routes around it where it can: search uses the autocomplete JSON endpoint and shelf uses the public RSS feed, both un-challenged. The commands that read /book/show/ (book, similar, reviews) can meet a challenge; when they do, goodread exits cleanly with code 5 and the hint suggests --cookies to lend a signed-in session. See troubleshooting.

The crawl pipeline

For more than a page at a time, the pipeline is seed to discover, crawl to fetch and parse, and db to export. Everything lands in one SQLite file under the data dir, with a content-addressed gzip page cache beside it so re-runs do not re-fetch unchanged pages. goodread is polite by default: a two second delay between requests and two workers.

The goodread library

The parsing and fetching live in their own package so you can read Goodreads pages from your own program without the CLI:

import "github.com/tamnd/goodread-cli/pkg/goodread"

c := goodread.New()
book, err := c.Book(ctx, "2767052")
if err != nil {
    log.Fatal(err)
}
fmt.Println(book.Title, book.AvgRating)

Independent and public-data only

goodread is an independent, open-source tool. It is not affiliated with, endorsed by, or sponsored by Goodreads or Amazon. It reads only public pages, at a polite default rate.

Install

go install github.com/tamnd/goodread-cli/cmd/goodread@latest

Prebuilt archives for Linux, macOS, Windows, and FreeBSD, plus Linux packages (deb, rpm, apk), SBOMs, and cosign-signed checksums, are on the release page. There is also a Homebrew cask and a Scoop entry:

brew install --cask tamnd/tap/goodread

The multi-arch container image is on GHCR:

docker run --rm ghcr.io/tamnd/goodread:0.1.0 search "the hunger games"

The binary is pure Go (CGO_ENABLED=0) with no runtime dependencies.