WikiPage

Wikipedia Pager for Physical Printing
Evolutionary Development of Print-Optimized Rendering

[All Generations] [Browse Files]

Abstract

This document describes an evolutionary software development process applied to the problem of rendering Wikipedia articles for print. The process has iterated through four generations, selecting for fitness criteria: pagination quality (absence of orphaned headings), mathematical equation rendering, and two-column layout balance. The current population (Generation 4) contains two candidates that satisfy these criteria on the test corpus.

Problem Statement

Wikipedia articles rendered through browser print functions exhibit two defects:

  1. Orphaned headings: Section headings appearing at page bottoms without following content, violating typographic convention.
  2. Equation rendering failure: Mathematical notation (rendered as SVG images by Wikipedia) breaking when converted to data URIs via HTML canvas.

Method

Candidates were developed using an evolutionary model with three stages per generation:

  1. Ontogeny: Expression of genome (specification) into phenotype (implementation)
  2. Ethology: Behavioral observation via automated browser testing (Puppeteer)
  3. Selection: Fitness evaluation and extinction of non-viable candidates

Generations

Generation 0: Initial Population

Five candidates spanning three invocation methods: CLI (node.js), web application, and bookmarklet. CLI candidates (002, 007) went extinct due to environment constraints. Survivors: 005 (web), 006 (bookmarklet).

Population record

Generation 1: Tensor Contraction

The invocation dimension was eliminated; each candidate became a (web app, bookmarklet) pair. Five candidates tested against the Directed Acyclic Graph article. Finding: CSS-only pagination (104) outperformed Paged.js (103) on orphan prevention, but 104 broke equation rendering.

Population record

Generation 2: Crossover

Objective: combine 104's pagination with 103's equation preservation. Root cause identified: canvas-based image conversion does not handle SVG correctly. Solution: selective image conversion—preserve SVG/equation images as external URLs, convert raster images to data URI.

Population record

Generation 3: Layout

Web app only (bookmarklets deprecated). Two-column layout with image differentiation: equations flow inline, illustrations float right. PDF page verification via pdftoppm introduced.

Generation 3 index

Generation 4: Current

Promoted elite candidates from Generation 3. Verified two-column layout with readable equations and text flow around illustrations.

Generation 4 index

Results

CandidatePagination MethodImage HandlingOrphaned Headings
401 (current)CSS + DOM wrappingEquations inline, illustrations 40%0
402 (current)CSS + flexboxEquations inline, illustrations 40%0
204CSS + DOM wrappingSelective (SVG preserved)0

Test corpus: Pythagorean theorem, Alan Turing, Voigt notation (Wikipedia). Includes equation-heavy content with matrices and tensors.

Artifacts

ArtifactLocation
Web application (401)candidate_401/index.html
Web application (402)candidate_402/index.html
Specificationspec.md
PDF verification outputgen_4/pdf_output/
All generationsgenerations.html

Limitations