Bresenham Line Drawing

Comparative reference for five hand-tuned 6502 line implementations on the BBC Micro: Elite, RTW, NJ, Tricky (all MODE 4/4S), and Raster (MODE 2). All numbers from line-drawing-implementations — the comparative-analysis PDF that ships alongside the source repo kieranhj/line-test.

The envelope-pushing options are NJ and Raster: both pull tricks (cumulative-mask batching, carry-chain invariants, deferred counting) that the textbook 6502 Bresenham doesn’t.

TL;DR — pick one

Use casePickWhy
Long lines, performance first, code size no objectNJ~26c/pixel steep; ~5c/pixel shallow pass-through; 8.75c/pixel horizontal
Memory-tight (single page)RTW297 bytes; ~42c/pixel; clean source
Balanced speed/size, ROM-friendly (no SMC)Tricky1270 B; ~34c/pixel; dispatch-table architecture
Polygon edges (chained segments)RasterOpen endpoints, runtime colour, MODE 2
Very short lines (3-8 pixels)RTW or TrickyLow setup cost — NJ’s ~100c setup penalty doesn’t amortise

Steep-line cycle breakdown (per pixel, common path)

ComponentEliteRTWNJTrickyRaster Y-upRaster Y-down
Plot pixel141513131616
Error update984655
Error store/load32322
Y-step555548
Count58488
Carry restore22
Total~35~42~26~34~37~41

(Assumes: no x-step, no row/column crossing, branch-not-taken common path.)

Why NJ wins steep: three independent savings compound — register-flow error (no LDA/STA zp), self-modified SBC #imm instead of SBC zp, and deferred pixel counting (DEC only on x-step). Total advantage over Tricky: 8c/pixel.

Why Raster Y-down costs 4c more than Y-up: DEY/BMI is carry-neutral; INY/CPY #8 clears C and forces an explicit SEC to restore the C=1 invariant.

Shallow-line per-pixel costs

Shallow lines differ fundamentally between implementations:

StrategyImplPass-through cycles/pixelHorizontal (amortised)
Per-pixel RMWElite, Tricky~34-38same
Screen-byte cachingRTW~41 (cached)~32 (8 pixels share read+write)
Cumulative mask batchingNJ~5 (pass-through)~8.75 (8-pixel batch)
Pixel-pair (MODE 2)Raster~21.5 (fast-fast pair)~21.5

NJ’s cumulative mask batching is the standout. For 8 consecutive horizontal pixels: 7 pass-throughs × 5c + 1 batch write (32c) + 3c carry = 70c total = 8.75c/pixel. This is close to the theoretical minimum (the screen RMW alone is ~11c per 8 pixels).

Setup cost — when long-loop wins flip

Setup runs once per line. For short lines it dominates.

ImplSetup cyclesCycles/pixelBreak-even vs RTW (steep)
RTW45-55~42(baseline)
Tricky50-70~34~3 pixels
Elite80-100~35~7 pixels
NJ100-120~26~4 pixels
Raster50-60 (first), ~30 chained~37(different metric)

For a 5-pixel line, RTW (210c + 50c setup = 260c) beats NJ (130c + 110c setup = 240c) only marginally. Below 4 pixels, RTW wins outright. This matters for particle systems and short-segment fonts.

Code size

ImplBytesRatioFits in…
RTW2971.0×One page with room to spare
Elite~3501.2×Two pages
Tricky12704.3×5 pages
Raster14254.8×6 pages (24 loop variants + utilities)
NJ23658.0×10 pages (32 fully-unrolled variants)

On a MODE 4S setup with code at &3000 and screen at &6000, NJ consumes 28% of available code space. RTW leaves ~11.5 KB free.

Architectural taxonomy

Direction handling

ImplMechanismSetup work
EliteConditional jumps select inline blockssmall
RTWSelf-modified branch offsets (4 patches)small
NJSelf-modified SBC/ADC operands (16 patches) + dispatch tablelarge
TrickyPure 128-byte dispatch table + indirect JMPmedium
Raster24 dispatch entries + RTS-trampoline (PHA/PHA/RTS)medium

Error accumulator

ImplWherePer-pixel overhead
Elite, RTW, TrickyZP (LDA + STA)6c
NJ, Rasterregister flow (A↔X)4c

Pixel counting

ImplMechanismPer-pixel cost
EliteDEX/BNE5c
TrickyDEX/BNE4c
RTW, RasterDEC zp/BNE8c
NJcount axis-crossings only0c (common path)

NJ’s deferred counting — decrement only on x-step (steep) or y-step (shallow) — is the highest single-issue ROI of any of the techniques. For dx=10, dy=100 steep line: saves ~360 cycles.

Three innovations worth stealing

The PDF (§9.2) calls these out as the portable ideas. Each gets a dedicated page:

  1. cumulative-mask-batching — NJ’s threading of X across pixel columns so 8 EORs become 1 RMW. Adaptable to any MODE 4 implementation.
  2. carry-chain-invariant — Raster’s systematic C=1 maintenance. Exploits DEY/BMI carry-neutrality on Y-up. Removes SEC from the hot loop.
  3. open-endpoint-chaining — Raster’s “don’t draw last pixel; leave state pointing at endpoint.” Polygon vertices write exactly once → no double-XOR artifacts.

The hybrid ideal (§9.3)

A theoretical best-of-breed MODE 4 implementation would combine: NJ’s cumulative batching + deferred counting + self-mod SBC, Raster’s carry-chain + open endpoints + branch outlining, Tricky’s endpoint-reversal (always draw Y-up). Estimated ~22-24c/pixel steep, ~5c/pixel shallow, ~1500-2000 bytes. Nobody has shipped this; the line-test repo benchmarks the existing four.

Implementation notes worth knowing

  • Endpoint conventions: Elite/RTW/NJ/Tricky draw both endpoints (closed). Raster draws first but not last — designed for shared-vertex polylines. Don’t mix conventions in one renderer.
  • Vertical wrap: only Tricky has it (optional VWRAP flag). Useful for wrap-around playfields.
  • 65C12 / Master: Elite, RTW, NJ, Tricky are NMOS-compatible. Raster requires 65C12 — uses BRA extensively for branch outlining (the analysis doc’s “Works on 65C02: Yes” claim for all five is wrong on Raster; see raster-source).
  • Colour: only Raster supports it. The others are pure-EOR monochrome. Adding colour costs ~5c/pixel (LDA/AND/ORA/STA vs EOR).
  • MODE 2 porting: the MODE-4 implementations need ~3c/pixel extra (AND/ORA pixel writing) plus 64-column screen arithmetic. Raster’s pixel-pair model is likely near-optimal for MODE 2 already.

Cross-references

  • 6502 — instruction timing
  • 6502-isa — SBC/ADC variant cycle costs
  • modes — MODE 4 vs MODE 2 byte layouts
  • memory-map — screen base addresses
  • fast-animation — sibling technique (sprites rather than lines)
  • multiplication — analogous “compare implementations” technique page

This wiki is curated by Claude following the LLM-Wiki methodology — a human curates source documents, the LLM compiles structured cross-linked markdown. Content may contain errors, omissions, or stale claims. For authoritative information refer to the original source documents in the bbc-documents GitHub archive.