Symmetry, Reduction, and Entropy in Deep Neural Networks

A geometric–statistical physics foundation for deep learning, from linear solvable limits to nonlinear depth-scaling laws.

Project Overview

This project develops a geometric and statistical–mechanical foundation for deep learning built on the structural chain:

Symmetry $\rightarrow$ Moment Map $\rightarrow$ Reduction $\rightarrow$ Entropy $\rightarrow$ Free Energy $\rightarrow$ Scaling Laws.

The starting point is a fully solvable linear limit — Deep Linear Networks (DLNs) — where symmetry is explicit and entropy can be computed from orbit volumes.

From this solvable template, we construct a nonlinear extension for realistic deep networks (e.g. equivariant models and residual networks), aiming to derive gauge-invariant macroscopic observables, microstate entropy formulas, and infinite-depth scaling laws.


Layer I — DLN as a Solvable Structural Baseline

For a depth-$L$ linear network:

\[X = W_L \cdots W_1\]

the parameter space carries a natural gauge symmetry:

\[(W_L,\dots,W_1) \mapsto (W_L Q_{L-1}, Q_{L-1}^{-1} W_{L-1} Q_{L-2}, \dots, Q_1^{-1} W_1)\]

leaving $X$ invariant. This structure induces:

  • A moment map (balanced condition)
  • A reduced manifold (balanced slice)
  • Microstates: \(\mathcal{O}_X = \{ \theta : \Phi(\theta) = X \}\)
  • Entropy: \(S(X) = \log \text{vol}(\mathcal{O}_X)\)
  • Free energy: \(F_\beta(X) = E(X) - \beta^{-1} S(X)\)

Layer II — Nonlinear Extension: Core Open Problems

The central research effort is to extend the symmetry–entropy mechanism to nonlinear deep networks.

OP2 — Macroscopic Observables Beyond Linear $X$

In nonlinear networks, $X$ alone is insufficient. We study Jacobian SPD observables:

\[g_x = (Df(x))^\top Df(x)\]

OP3 — Nonlinear Microstates and Entropy

Define microstates as:

\[\mathcal{O}_y = \{ \theta : \Phi(\theta) = y \}\]

Questions: Is $\mathcal{O}_y$ an orbit or a symplectic reduced space? Can entropy be computed via Duistermaat–Heckman density structures?

OP7 — Weyl-Chamber Diffusion of Jacobian Spectra

For deep residual chains $Df(x) = J_L \cdots J_1$, as $L \to \infty$:

  • Do log-singular values converge to diffusion in a Weyl chamber?
  • What are drift and diffusion coefficients?

Layer III — Deliverables for Deep Learning

1. Training Diagnostics

Measurable geometric quantities:

  • Jacobian SPD as geometric temperature
  • Moment-map imbalance as gauge instability
  • Spectral gap statistics as entropy indicators

2. Unified Conceptual Framework

Dropout, normalization, and implicit bias, all interpreted through: Symmetry + Reduction + Entropy + Free Energy + Scale.


Selected References

  • Menon & Yu. Entropy and Symmetry in Deep Linear Networks. arXiv (2023).
  • Poole et al. Exponential Expressivity in Deep Neural Networks. NeurIPS (2016).
  • Amari. Information Geometry and Its Applications. Springer (2016).
  • Bronstein et al. Geometric Deep Learning. arXiv:2104.13478 (2021).