Cycle Stretching

The 6502 runs at 2 MHz. Several BBC peripherals run at 1 MHz. To bridge the gap, the BBC’s clock generator stretches the 6502’s ΦIN clock whenever the CPU addresses one of the slow devices — extending the current cycle to 2 or 3 normal-cycle lengths so the slow device has time to respond.

Net effect: every LDA &FE40 (System VIA), every LDA &FE00 (CRTC), every LDA &FC00/&FD00 (1 MHz bus) is 5-6 cycles, not 4. RMW instructions (INC abs, LSR abs, etc.) pay it on both bus cycles. This dwarfs almost every other concern in tight 6502 code that talks to peripherals.

Which addresses are stretched

Address rangeDeviceStretched?
&0000-&7FFFMain RAMNo (2 MHz)
&8000-&BFFFSideways ROM/RAMNo (2 MHz)
&C000-&FBFFMOS ROMNo (2 MHz)
&FC00-&FCFFFREDYes
&FD00-&FDFFJIMYes
&FE00-&FE076845 CRTCYes
&FE08-&FE0F6850 ACIAYes
&FE10-&FE17Serial ULAYes
&FE18STATID (Econet station ID, BBC B/B+)Yes
&FE20-&FE2FVideo ULANo (2 MHz)
&FE30-&FE3FROMSEL / ACCCON / Memory controlNo (2 MHz)
&FE40-&FE5FSystem VIAYes
&FE60-&FE7FUser VIAYes
&FE80-&FE9F8271 / 1770 FDCNo (2 MHz)
&FEA0-&FEBF68B54 Econet ADLCNo (2 MHz)
&FEC0-&FEDFµPD7002 ADC (BBC B/B+)Yes
&FEE0-&FEFFTube ULANo (2 MHz)
&FF00-&FFFFMOS jumpblock + 6502 vectorsNo (2 MHz)

(Source: beebwiki-cycle-stretching.)

The Video ULA, the Tube, the FDC, and the ROMSEL/ACCCON registers are NOT stretched — surprising-but-true. Video ULA palette writes (STA &FE21) cost a flat 4 cycles; CRTC register writes (STA &FE01) cost 5-6.

The Master cartridge slot via the &FCFC-&FCFF page-wide register can be switched to 2 MHz access (OSBYTE &6B X=1, or ACCCON IFJ bit) — see 1mhz-bus Master cartridge section.

How much it costs

The stretch is variable: 2 or 3 normal-cycle equivalents, depending on whether the access started in or out of phase with the 1 MHz clock. Worst-case timing for common instructions:

InstructionNormal costStretched cost (worst case)
LDA abs / STA abs4c5-6c
LDA abs,X (no page cross)4c5-6c
LDA (zp),Y (no page cross)5c6-7c
INC abs / LSR abs (RMW)6c7-9c
BIT abs4c5-6c

The extra is +1c or +2c per stretched bus cycle. RMW instructions touch the bus twice, so they get hit twice.

Why 2 OR 3 cycles?

The CPU’s 2 MHz clock and the 1 MHz peripheral clock are not phase-locked from the CPU’s perspective. When a stretched access begins, the clock generator has to wait for the next 1 MHz rising edge before letting the cycle proceed:

  • If the CPU’s bus cycle started just before a 1 MHz edge: small extra wait → +1c overall.
  • If just after: must wait nearly a full 1 MHz period → +2c overall.

In practice a tight loop that hits a stretched address each iteration sees both stretches alternating, but you can’t predict which on a single access without phase-aligning first.

Phase-aligning to the 1 MHz clock

Reading any stretched address forces the CPU into sync with the 1 MHz phase. After the access, the CPU’s next action lands on a known 1 MHz boundary. So:

LDA &FE40       ; sync read — discard A; aligns next access to 1 MHz
                ; subsequent stretched accesses now have predictable cost

Useful before doing cycle-exact raster work that touches the System VIA timer (T1) or the CRTC.

Mechanism (low-level)

When the CPU’s address bus selects a 1 MHz address, IC 23 detects the match and raises pin 8. This connects to IC 33 pin 1, which gates the ΦIN signal feeding the 6502. The result: the current cycle is extended until the slow device has been given a full 1 MHz period to respond.

Devices on the I/O bus see normal 2 MHz timing; devices flagged as 1 MHz see slow timing; the CPU sees an extended cycle.

Why this matters for performance code

  • Polling a VIA timer / IFR is expensive. Every LDA &FE4D (System VIA IFR) is a stretched access. If you’re tight-polling for vsync (CA1 = bit 1) you’re paying 5-6c per check, not 4. Compare against using the IRQ.
  • CRTC reprogramming is expensive. Writing R12/R13 for a hardware scroll = 4 stretched accesses (STA &FE00, STA &FE01, STA &FE00, STA &FE01) = 20-24c, not 16. Negligible per frame but adds up for split-screen or per-line tricks.
  • The Video ULA is FAST. Palette flips for raster splits cost 4c per write — much cheaper than CRTC writes. Prefer the Video ULA for mid-frame work where possible.
  • VIA-based per-line timing is hard. Setting up a T1 timer to fire on a precise scanline involves stretched writes whose total varies by 1-2c. Phase-align with a sync read first.
  • The Tube is fast. Tube transfers via &FEE0-&FEFF are not stretched — the Tube ULA bridges the 2 MHz I/O bus and the 4 MHz parasite bus internally. This is part of why Tube I/O can compete with main-RAM access for bulk transfers.

Comparison with the 1 MHz bus

The 1 MHz expansion bus (FRED &FC, JIM &FD) is the canonical “stretched” address space and the most discussed case (NAUG Ch23). What’s often missed is that the same mechanism applies to all SHEILA peripherals listed above. The 1 MHz bus is just the externally-exposed end of the same slow-bus arrangement.

See 1mhz-bus for the cartridge / FRED / JIM specifics, including the Master cartridge’s optional 2 MHz mode.

See also


This wiki is curated by Claude following the LLM-Wiki methodology — a human curates source documents, the LLM compiles structured cross-linked markdown. Content may contain errors, omissions, or stale claims. For authoritative information refer to the original source documents in the bbc-documents GitHub archive.