Skip to main content

Genesis 1B: Training Progress, Live Results

Author: Robin, Kroonen AI Inc.

Genesis1Bpretrainingrtx-4090training live

⚡ Update, March 23, 2026

Training is past step 8,500 with loss at 1.42. The model is approximately 48% through the target 20,000 steps. ETA to genesis-1b-v0.1-base: ~5 days.

Model: Genesis 1B

Parameters1,003M (1.0B)
ArchitectureLlama-style decoder-only transformer
Hidden dim2048
Layers20
Attention heads16 (4 KV heads, GQA)
FFN dim5632 (SwiGLU)
Context length2048
Vocab size49,152
Precisionbfloat16
Positional encodingRoPE (θ=500,000)

Training Configuration

GPUs2× RTX 4090 (PCIe, no NVLink)
Batch size1 per GPU
Gradient accumulation64 steps
Effective batch262,144 tokens/step
Learning rate3e-4 → 3e-5 (cosine decay)
Warmup500 steps
OptimizerAdamW (β1=0.9, β2=0.95, wd=0.1)
Throughput~6,600 tok/s
Target5.2B tokens (20,000 steps)
Estimated time~10 days
NCCLNCCL_P2P_DISABLE=1

Smoke Test Results

Before committing to a multi-day run, the pipeline was tested methodically:

  1. Training only (no eval, no checkpoint): Verified training loop stability over 100+ steps. ✅
  2. Training + DCP checkpoint save: Ran 220 steps with --save-every 150. Sharded checkpoint saved at step 150 without deadlock. ✅
  3. Resume from checkpoint: Restarted with --resume, loaded DCP sharded state, continued training from step 150 to 300. Loss consistent with pre-save values. ✅
  4. Second checkpoint save: Step 300 save completed cleanly, overwriting the previous checkpoint. ✅

Training Progress: Live Results

The model has now trained well past the initial smoke test. Here is the full loss journey from step 0 to the current checkpoint:

StepLossStepLoss
011.173,4002.73
2004.873,6002.42
4004.343,8002.45
6003.554,0002.25
8003.034,2002.35
1,0003.274,4002.19
1,2003.024,6002.46
1,4003.024,8002.10
1,6002.945,0002.39
1,8002.745,5002.26
2,0002.546,0002.20
2,2002.366,5002.15
2,4002.447,0001.90
2,6002.547,5001.69
2,8002.628,0001.53
3,0002.688,5001.42
3,2002.48training live...

Loss has dropped from 11.17 to 1.42 in 8,500 steps (~48% of target), and the descent is not slowing. The model already shows emergent turn-taking structure in raw completions, before any instruction tuning or alignment. At this rate, sub-1.0 loss by step 20k is plausible. Each step processes 262,144 tokens, so the model has seen approximately 2.2B tokens so far out of a 60B token corpus, less than 4% of the available data, meaning zero repetition and no overfitting risk at this stage.

The live tracker on the homepage pulls from latest.json and updates automatically as new checkpoints are saved. All checkpoints are archived as they are written.

Early Loss Curve

StepLosstok/s
011.1765,134
109.036,434
207.626,439
307.076,444
1506.036,209
2905.276,157

Loss dropping steadily from 11.17 to 5.27 over 300 steps. Both GPUs at 100% utilization, ~21 GB VRAM used each, temps under 50°C.

The Dataset

~60B tokens, curated from public sources:

All tokenized with a custom SentencePiece BPE tokenizer trained on the corpus itself.

The Road to Genesis 1B v0.1

Pre-training is only the first phase. The full pipeline has four stages, each producing a progressively better model:

Phase 1: Pre-training (current, ~48% complete)

Complete 20,000 steps, consuming approximately 5.2B tokens. This produces genesis-1b-v0.1-base: the raw pre-trained foundation. No instruction following, no alignment, no personality yet. Just a model that has learned the structure of language from a diverse corpus.

Phase 2: SFT (Supervised Fine-Tuning)

Teach the model conversational ability, personality, and curiosity using curated dialogue data. The approach is inspired by Anthropic's Constitutional AI: define a set of principles (be helpful, be curious, be honest, don't be boring) and train the model to follow them. This is where Genesis diverges from the standard safety-first fine-tuning pipeline. The goal is a model with genuine personality, not a model optimized for refusal rates.

Phase 3: DPO (Direct Preference Optimization)

Refine taste and style. Train the model to prefer interesting, thoughtful responses over generic safe ones. Preference pairs are constructed to reward curiosity and penalize hedging. This is what separates a model worth talking to from a model that merely answers questions.

Phase 4: Continued pre-training cycles

Continue pre-training to 40,000 steps (~10.5B tokens), then run SFT and DPO again from the stronger base. Repeat at 60,000 and 80,000+ steps. Each cycle produces a better pre-trained foundation, which produces a better aligned model. The structure is a tree: the pre-training trunk keeps growing, and SFT/DPO branches off at each milestone checkpoint.

At 76,300 steps the model hits Chinchilla-optimal compute allocation for a 1B parameter model (~20B tokens seen). The 60B token corpus means zero data repetition even at 230,000 steps. Every token the model sees during the extended runs is genuinely new data.

Try It Yourself

The model is training live. Select a checkpoint and generate text to see how it evolves over time:

Powered by HuggingFace ZeroGPU, free inference on NVIDIA H200

Contact

If you are a founder, independent researcher, or small lab working on multi-GPU local training and have encountered similar checkpoint or synchronization failures on consumer hardware, reach out at [email protected].