A crash introduction to BSD conjecture

The pdf version is A crash introduction to BSD conjecture .

We begin with the Weierstrass form of elliptic equation, i.e. look it as an embedding cubic curve in {\mathop{\mathbb P}^2}.

Definition 1 (Weierstrass form) {E \hookrightarrow \mathop{\mathbb P}^2 }, In general the form is given by,

\displaystyle E: y^2+a_1xy+a_3y=x^3+a_2x^2+a_4x+a_6 \ \ \ \ \ (1)

If {char F \neq 2,3}, then, we have a much more simper form,

\displaystyle y^2=x^3+ax+b, \Delta:=4a^3+27b^2\neq 0. \ \ \ \ \ (2)

Remark 1

\displaystyle \Delta(E)=\prod_{1\leq i,\neq j\leq 3}(z_i-z_j)

Where {z_i^3+az_i+b=0, \forall 1\leq i\leq 3}.

We have two way to classify the elliptic curve {E} living in a fix field {F}. \paragraph{j-invariant} The first one is by the isomorphism in {\bar F}. i.e. we say two elliptic curves {E_1,E_2} is equivalent iff

\displaystyle \exists \rho:\bar F\rightarrow \bar F

is a isomorphism such that {\rho(E_1)=E_2}.

Definition 2 (j-invariant) For a elliptic curve {E}, we have a j-invariant of {E}, given by,

\displaystyle j(E)=1728\frac{4a^3}{4a^3+27b^2} \ \ \ \ \ (3)

Why j-invariant is important, because j-invariant is the invariant depend the equivalent class of {E} under the classify of isomorphism induce by {\bar F}. But in one equivalent class, there also exist a structure, called twist.

Definition 3 (Twist) For a elliptic curve {E:y^2=x^3+ax+b}, all elliptic curve twist with {E} is given by,

\displaystyle E^{(d)}:y^2=x^3+ad^2x+bd^3 \ \ \ \ \ (4)

So the twist of a given elliptic curve {E} is given by:

\displaystyle H^1(Gal(\bar F/ F), Aut(E_{\bar F})) \ \ \ \ \ (5)

Remark 2 Of course a elliptic curve {E:y^2=x^3+ax+b} is the same as {E:y^2=x^3+ad^2x+bd^4}, induce by the map {\mathop{\mathbb P}^1\rightarrow \mathop{\mathbb P}^1, (x,y,1)\rightarrow (x,dy,1)}.

But this moduli space induce by the isomorphism of {F} is not good, morally speaking is because of the abandon of universal property. see \cite{zhang}. \paragraph{Level {n} structure} We need a extension of the elliptic curve {E}, this is given by the integral model.

Definition 4 (Integral model) {s:=Spec(\mathcal{O}_F)}, {E\rightarrow E_s}. {E_s} is regular and minimal, the construction of {E_s} is by the following way, we first construct {\widetilde{E_s} } and then blow up. {\widetilde E_s} is given by the Weierstrass equation with coefficent in {\mathcal{O}_F}.

Remark 3 The existence of integral model need Zorn’s lemma.

Definition 5 (Semistable) the singularity of the minimal model of {E} are ordinary double point.

Remark 4 Semistable is a crucial property, related to Szpiro’s conjecture.

Definition 6 (Level {n} structure)

\displaystyle \phi: ({\mathbb Z}/n{\mathbb Z})_s^2\longrightarrow E[N] \ \ \ \ \ (6)

{P=\phi(1,0), Q=\phi(o,1)} The weil pairing of {P,Q} is given by a unit in cycomotic fields, i.e. {<P,Q>=\zeta_N\in \mu_{N}(s)}

What happen if {k={\mathbb C}}? In this case we have a analytic isomorphism:

\displaystyle E({\mathbb C})\simeq {\mathbb C}/\Lambda \ \ \ \ \ (7)

Given by,

\displaystyle {\mathbb C}/\Lambda \longrightarrow \mathop{\mathbb P}^2 \ \ \ \ \ (8)

\displaystyle z\longrightarrow (\mathfrak{P}(z), \mathfrak{P}'(z), 1 ) \ \ \ \ \ (9)

Where {\mathfrak{P(z)}=\frac{1}{z^2}+\sum_{\lambda\in \Lambda,\lambda\neq 0}(\frac{1}{(z-\lambda)^2}-\frac{1}{\lambda^2})}, and the Weierstrass equation {E} is given by {y^2=4x^3-60G_4(\Lambda)x-140G_6(\Lambda)}. The full n tructure of it is given by {{\mathbb Z}+{\mathbb Z}\lambda} and the value of {P,Q}, i.e.

\displaystyle P=\frac{1}{N}, Q=\frac{\tau}{N} \ \ \ \ \ (10)

Where {\tau} is induce by

\displaystyle \Gamma(N):=ker(SL_2({\mathbb Z})\rightarrow SL_2({\mathbb Z}/n{\mathbb Z})) \ \ \ \ \ (11)

The key point is following:

Theorem 7 {k={\mathbb C}}, the moduli of elliptic curves with full level n-structure is identified with

\displaystyle \mu_N^*\times H/\Gamma(N) \ \ \ \ \ (12)

Now we discuss the Mordell-Weil theorem.

Theorem 8 (Mordell-Weil theorem)

\displaystyle E(F)\simeq {\mathbb Z}^r\oplus E(F)_{tor}

The proof of the theorem divide into two part:

  1. Weak Mordell-Weil theorem, i.e. {\forall m\in {\mathbb N}}, {E(F)/mE(F)} is finite.
  2. There is a quadratic function,

    \displaystyle \|\cdot\|: E(F)\longrightarrow {\mathbb R} \ \ \ \ \ (13)

    {\forall c\in {\mathbb R}}, {E(F)_c=\{P\in E(F), \|P\|<c\}} is finite.

Remark 5 The proof is following the ideal of infinity descent first found by Fermat. The height is called Faltings height, introduce by Falting. On the other hand, I point out, for elliptic curve {E}, there is a naive height come from the coefficient of Weierstrass representation, i.e. {\max\{|4a^3|,|27b^2|\}}.

While the torsion part have a very clear understanding, thanks to the work of Mazur. The rank part of {E({\mathbb Q})} is still very unclear, we have the BSD conjecture, which is far from a fully understanding until now.

But to understanding the meaning of the conjecture, we need first constructing the zeta function of elliptic curve, {L(s,E)}.

\paragraph{Local points} We consider a local field {F_v}, and a locally value map {F\rightarrow F_{\nu}}, then we have the short exact sequences,

\displaystyle 0\longrightarrow E^0(F_{\nu})\longrightarrow E(F_{\nu})=E_s(\mathcal{O}_F)\longrightarrow E_s(K_0)\longrightarrow 0 \ \ \ \ \ (14)

Topologically, we know {E(F_{\nu})} are union of disc indexed by {E_s(k_{\nu})},

\displaystyle |E_s(k_{\nu})| \sim q_{\nu}+1=\# \mathop{\mathbb P}^1(k_{\nu})

. Define {a_{\nu}=\# \mathop{\mathbb P}^1(k_{\nu})-|E_s(k_{\nu})|}, then we have Hasse principle:

Theorem 9 (Hasse principle)

\displaystyle |a_{\nu}|\leq 2\sqrt{q_{\nu}} \ \ \ \ \ (15)

Remark 6 I need to point out, the Hasse principle, in my opinion, is just a uncertain principle type of result, there should be a partial differential equation underlying mystery.

So count the points in {E(F)} reduce to count points in {H^1(F_{\nu},E(m))}, reduce to count the Selmer group {S(E)[m]}. We have a short exact sequences to explain the issue.

\displaystyle 0\longrightarrow E(F)/mE(F) \longrightarrow Sha(E)[m] \longrightarrow E(F)/mE(F)\longrightarrow 0 \ \ \ \ \ (16)

I mention the Goldfold-Szipiro conjecture here. {\forall \epsilon>0}, there {\exists C_{\epsilon}(E)} such that:

\displaystyle \# (E)\leq c_{\epsilon}(E)N_{E/{\mathbb Q}}(N)^{\frac{1}{2}+\epsilon} \ \ \ \ \ (17)

\paragraph{L-series} Now I focus on the construction of {L(s,E)}, there are two different way to construct the L-series, one approach is the Euler product.

\displaystyle L(s,E)=\prod_{\nu: bad}(1-a_{\nu}q_{\nu}^{-s})^{-1}\cdot \prod_{\nu:good}(1-a_{\nu}q_{\nu}^{-s}+q_{\nu}^{1-2s})^{-1} \ \ \ \ \ (18)


Where {a_{\nu}=0,1} or {-1} when {E_s} has bad reduction on {\nu}.

The second approach is the Galois presentation, one of the advantage is avoid the integral model. Given {l} is a fixed prime, we can consider the Tate module:

\displaystyle T_l(E):=\varprojlim_{l^n} E[l^n] \ \ \ \ \ (19)

Then by the transform of different embedding of {F\hookrightarrow \bar F}, we know { T_{l}(E)/Gal(\bar F/F)}, decompose it into a lots of orbits, so we can define {D_{\nu}}, the decomposition group of {w}(extension of {\nu} to {\bar F}). We define {I_{\nu}} is the inertia group of {D_{\nu}}.

Then {D_{\nu}/I_{\nu}} is generated by some Frobenius elements

\displaystyle Frob{\nu}x\equiv x^{q_{\nu}} (mod w),\forall x\in \mathcal{O}_{\bar Q} \ \ \ \ \ (20)

So we can define

\displaystyle L_{\nu}(s,E)=(1-q_{\nu}^{-s}Frob_{\nu}|T_{l}(E)^{I_{\nu}})^{-1} \ \ \ \ \ (21)

And then {L(s,E)=\prod_{\nu}L_{\nu}(s,E)}.

Faltings have proved {L_{\nu}(s,E)} is the invariant depending the isogenous class in the follwing meaning:

Theorem 10 (Faltings) {L_{\nu}(s,E)} is an isogenous ivariant, i.e. {E_1} isogenous to {E_2} iff {\forall a.e. \nu}, {L_{\nu}(s,E_1)=L_{\nu}(s,E_2)}.

\displaystyle L(s,E)=L(s-\frac{1}{2},\pi ) \ \ \ \ \ (22)

Where {\pi} come from an automorphic representation for {GL_2(A_F)}. Now we give the statement of BSD onjecture. {R} is the regulator of {E}, i.e. the volume of fine part of {E(F)} with respect to the Neron-Tate height pairing. {\Omega} be the volume of {\prod_{v|\infty}F(F_v)} Then we have,

  1. {ord_{s=1}L(s,E)=rank E(F)}.
  2. {|Sha(E)|<\infty}.
  3. {\lim_{s\rightarrow 0}L(s,E)(s-1)^{-rank(E)}=c\cdot \Omega(E)\cdot R(E)\cdot |Sha(E)|\cdot |E(F)_{tor}|^{-2}}

Here {c} is an explictly positive integer depending only on {E_{\nu}} for {\nu} dividing {N}.



SL_2(Z) and its congruence subgroups

The pdf version is SL_2(Z) and its congruence subgroups.

We know we can always do the following thing:

\displaystyle R\ commutative\ ring \longrightarrow \ "general\ linear\ group" \ GL_2(R) \ \ \ \ \ (1)


\displaystyle GL_2(R):=\{\begin{pmatrix} a & b \\ c & d \end{pmatrix}: det \begin{pmatrix} a & b \\ c & d \end{pmatrix}=R^*, a,b,c,d\in R\} \ \ \ \ \ (2)


Remark 1 Why it is {R^*} but not 1? if it is 1, then the action {R/GL_2(R)} distribute is not trasitive on {R}, i.e. every element in unite group present a connected component

Now we consider the subgroup {SL_2(R)\subset GL_2(R)}.

\displaystyle SL_2(R)=\{\begin{pmatrix} a & b\\ c & d \end{pmatrix}: det \begin{pmatrix} a & b\\ c & d \end{pmatrix}=1, a,b,c,d\in R \} \ \ \ \ \ (3)

We are most interested in the case {R={\mathbb Z}, {\mathbb Z}/n{\mathbb Z}}. So how to investigate {SL_2({\mathbb R})}? We can look at the action of it on something, for particular, we look at the action of it on Riemann sphere, i.e. { \hat {\mathbb C}/({\mathbb R})} given by fraction linear map:

\displaystyle g(z):=\frac{az+b}{cz+d}, g(\infty)=\frac{a}{c} \ \ \ \ \ (4)

Remark 2 What is fraction linear map? This action carry much more information than the action on vector, thanks for the exist of multiplication in {{\mathbb C}} and the algebraic primitive theorem. Due to I always looks the fraction linear map as something induce by the permutation of the roots of polynomial of degree 2, this is true at least for fix points, and could natural extension. So how about the higher dimension generate? consider the transform of {k-1} tuples induce by polynomial with degree {k}?

Remark 3

  1. {SL_2({\mathbb R})/\pm I:=PSL_2({\mathbb R})}, then {PSL_2({\mathbb R}) } action faithful on {\hat C}, i.e. except identity, every action is nontrivial. This is easy to be proved, observed,

    \displaystyle \frac{az+b}{cz+d}=z,\forall z\in \hat{\mathbb C}\Longrightarrow \begin{pmatrix} a & b\\ c & d \end{pmatrix}=\begin{pmatrix} 1 & 0\\ 0&1 \end{pmatrix} or \begin{pmatrix} -1&0\\ 0&-1 \end{pmatrix} \ \ \ \ \ (5)

  2. Up half plane {H} is invariant under the action of {PSL_2({\mathbb R})}, i.e. {\forall g\in PSL_2({\mathbb R})}, {gH=H}. The proof is following,
  3. \displaystyle \begin{array}{rcl} Im(\frac{az+b}{cz+d}) & = & Im(\frac{(az+b)(c\bar z+d)}{|cz+d|^2})\\ & = & Im(\frac{ac|z|^2+bc\bar z+adz+bd}{|cz+d|^2})\\ & > & 0,\ due\ to\ ad=bc+1. \end{array}


Now we focus on {SL_2({\mathbb Z})} or the same,{PSL({\mathbb Z})}. All the argument for {SL_2(R)} make sense for

\displaystyle \Gamma:= SL_2({\mathbb Z}), \bar \Gamma:=SL_2({\mathbb Z})/\pm I \ \ \ \ \ (6)

Fix {N\in {\mathbb N}}, define,

\displaystyle \Gamma(N):=\{\begin{pmatrix} a &b\\ c&d \end{pmatrix}, a,d \equiv 1(mod N), b,c\equiv 0(mod N)\} \ \ \ \ \ (7)

Then {\Gamma(N)} is the kernel of map {SL_2({\mathbb Z})\rightarrow SL_2({\mathbb Z}/n{\mathbb Z})}, i.e. we have short exact sequences,

\displaystyle 0\longrightarrow \Gamma(N)\longrightarrow \Gamma\longrightarrow SL_2({\mathbb Z}/n{\mathbb Z})\longrightarrow 0 \ \ \ \ \ (8)

Remark 4 The relationship of {\Gamma(N)\subset \Gamma} is just like {N{\mathbb Z}+1\subset {\mathbb Z}}.

Definition 1 (Congruence group) A subgroup of {\Gamma} is called a congruence group iff {\exists n\in {\mathbb N}}, {\Gamma(N)\subset G}.

Example 1 We give two examples of congruence subgroups here.

  1. \displaystyle \Gamma_1(N)=\{\begin{pmatrix} 1& *\\ 0 &1 \end{pmatrix} mod N\} \ \ \ \ \ (9)

  2. \displaystyle \Gamma_0(N)=\{\begin{pmatrix} * & *\\ 0 & * \end{pmatrix}mod N\} \ \ \ \ \ (10)


Definition 2 (Fundamental domain)

\displaystyle F=\{z\in H:-\frac{1}{2}\leq Re(z)\leq \frac{1}{2}\ and |z|\geq 1\} \ \ \ \ \ (11)

Now here is a theorem charistization the fundamental domain.

Theorem 3 This domain {F} is a fundamental domain of {\hat {\mathbb H}/({\mathbb R})}

Proof: Ths key point is {SL_2({\mathbb Z})} have two generators,

  1. {\tau_a: z\rightarrow z+a, \forall a\in {\mathbb Z}}.
  2. {s: z\rightarrow \frac{1}{z}}.

Thanks to this two generator exactly divide the action of {\Gamma} on {H} into a lots of scales, then {\Omega} is a fundamental domain is a easy corollary. \Box

Remark 5 This is not rigorous, {H} need be replace by {\hat H}, but this is very natural to get a modification to a right one.

Remark 6 {z_1,z_2\in \partial F} are {\Gamma} equivalent iff {Re(z)=\pm \frac{1}{2}} and {z_2=z_1\pm 1} or if {z_1} on the unit circle and {z_2=-\frac{1}{z_1}}

Remark 7 If {z\in F}, then {\Gamma_z=\pm I} expect in the following three case:

  1. {\Gamma=\pm \{\tau,s\}} if {z=i}.
  2. {\Gamma=\pm\{ I,s\tau, (s\tau)^2\}} if {z=w=-\frac{1}{2}+\frac{\sqrt{-3}}{2}}.
  3. {\Gamma=\pm\{I,\tau s, (\tau s)^2\}} if {z=-\bar w=\frac{1}{2}+\frac{\sqrt{-3}}{2}}.

Where {\tau=\tau_1}.

Remark 8 The group {\bar \Gamma=SL_2({\mathbb Z})/\pm I} is generated by the two elements {s}, {\tau}. In other word, any fraction linear transform is a “word” induce by {s,\tau,s^{-1}.\tau^{-1}}. But not free group, we have relationship {s^2=-I,(s\tau)^3=-I}.

The natural function space on {F} is the memorphic function, under the map: {H\rightarrow D-\{0\}}, it has a {q}-expension,

\displaystyle f(q)=\sum_{k\in {\mathbb Z}}a_kq^k \ \ \ \ \ (12)

And there are only finite many negative {k} such that {a_k\neq 0}.

Dirichlet hyperbola method

A pdf version is Dirichlet hyperbola method.

1. Introduction

Theorem 1

\displaystyle \sum_{1\leq n\leq x}d(n)=\sum_{1\leq n\leq x}[\frac{x}{n}]=xlogx+(2\gamma-1) x+O(\sqrt{x}) \ \ \ \ \ (1)


Remark 1 I thought this problem initial 5 years ago, cost me several days to find a answer, I definitely get something without the argument of Dirchlet hyperbola method and which is weaker but morally the same camparable with the result get by Dirichlet hyperbola method.

Remark 2 How to get the formula:

\displaystyle \sum_{1\leq n\leq x}d(x)=\sum_{1\leq n\leq x}[\frac{x}{n}]? \ \ \ \ \ (2)

In fact,

\displaystyle \sum_{1\leq n\leq x}d(x)=\sum_{1\leq ab\leq x}1=\sum_{1\leq n\leq x}[\frac{x}{n}]\ \ \ \ \ (3)

Which is the integer lattices under or lying on the hyperbola {\{(a,b)|ab=x\}}.

Remark 3 By trivial argument, we can bound the quantity as following way,

\displaystyle \begin{array}{rcl} \sum_{1\leq n\leq x}[\frac{x}{n}] & = & \sum_{1\leq ab\leq x}1\\ & = & x\sum_{i=1}^x\frac{1}{i}-\sum_{i=1}^x\{\frac{x}{i}\}\\ & = &xlnx+\gamma x+O(x) \end{array}

The error term is {O(x)}, which is too big. But fortunately we can use the symmetry of hyperbola to improve the error term.


\displaystyle \begin{array}{rcl} \sum_{1\leq n\leq x}d(n) & = & \sum_{ab\leq x}1\\ & = & \sum_{a\geq \sqrt{x}}[\frac{x}{b}]+\sum_{b\geq \sqrt{x}}[\frac{x}{a}]-\sum_{1\leq a,b\leq \sqrt{x}}1\\ & = & xlogx+(2\gamma-1)x+O(\sqrt{x}) \end{array}


Theorem 2

Given a natural number k, use the hyperbola method together
with induction and partial summation to show that

\displaystyle \sum_{n\leq x}d_k(n) = xP_k(log x) + O(x^{1-\frac{1}{k}+\epsilon}), n\leq x \ \ \ \ \ (4)

where {P_k(t)} denotes a polynomial of degree {k-1} with leading term {\frac{t^{k-1}}{(k-1)!}}.

Remark 4 {P_k(x)} is the residue of {\zeta(s)^kx^ss^{-1}} at {s=1}.


We can establish the dimension 3 case directly, which is the following asymptotic formula,

\displaystyle \sum_{1\leq xy\leq n}[\frac{n}{xy}]=xP_2(logx)+O(x^{1-\frac{1}{3}+\epsilon}) \ \ \ \ \ (5)

The approach is following, we first observe that

\displaystyle \sum_{1\leq xy\leq n}[\frac{n}{xy}]=\sum_{xyz\leq n}1 \ \ \ \ \ (6)

The problem transform to get a asymptotic formula for the lattices under 3 dimension hyperbola. The first key point is, morally {([n^{\frac{1}{3}}],[n^{\frac{1}{3}}],[n^{\frac{1}{3}}])} is the central point under the hyperbola.

Then we can divide the range into 3 parts, and try to get a asymptotic formula for each part then add them together. Assume we have:

  1. {A_x=\sum_{1\leq r\leq [n^{\frac{1}{3}}]}\sum_{1\leq yz\leq [n^{\frac{2}{3}}]}[\frac{r}{yz}]}.
  2. {A_y=\sum_{1\leq r\leq [n^{\frac{1}{3}}]}\sum_{1\leq xz\leq [n^{\frac{2}{3}}]}[\frac{r}{yz}]}.
  3. {A_z=\sum_{1\leq r\leq [n^{\frac{1}{3}}]}\sum_{1\leq xy\leq [n^{\frac{2}{3}}]}[\frac{r}{yz}]}.

Then the task transform to get a asymptotic formula,

\displaystyle A_x=A_y=A_z=xQ_2(logx)+O(x^{1-\frac{1}{3}+\epsilon}) \ \ \ \ \ (7)

But we can do the same thing for {\sum_{1\leq yz\leq [n^{\frac{2}{3}}]}[\frac{r}{yz}]} and then integral it. This end the proof. For general {k\in {\mathbb N}}, the story is the same, by induction.

Induction on {k} and use the Fubini theorem to calculate {\sum_{x_1...x_r\leq n}\frac{n}{x_1...x_r},\forall 1\leq r\leq k}. \Box

There is a major unsolved problem called Dirichlet divisor problem.

\displaystyle \sum_{n\leq x}d(n) \ \ \ \ \ (8)

What is the error term? The conjecture is the error term is {O(x^{\theta}), \forall \theta>\frac{1}{4}}, it is known that {\theta=\frac{1}{4}} is not right.

Remark 5

To beats this problem, need some tools in algebraic geometry.

2. Several problems

{\forall k\in {\mathbb N}}, is there a asymptotic formula for {\sum_{t=1}^n\{\frac{kn}{t}\}} ?

{\forall k\in {\mathbb N}}, {f(n)} is a polynomial with degree {k}, is there a asymptotic formula for {\sum_{t=1}^n\{\frac{f(n)}{t}\}} ?

{\forall k\in {\mathbb N}}, {g(n)} is a polynomial with degree {k}, is there a asymptotic formula for {\sum_{t=1}^n\{\frac{n}{g(t)}\}} ?

Theorem 3 {k\in {\mathbb N}}, then we have

\displaystyle \lim_{n\rightarrow \infty}\frac{\{\frac{kn}{1}\}+\{\frac{kn}{2}\}+...+\{\frac{kn}{n}\}}{n}=k(\sum_{i=1}^k\frac{1}{i}-lnk-\gamma) \ \ \ \ \ (9)


\displaystyle \begin{array}{rcl} \frac{\{\frac{kn}{1}\}+\{\frac{kn}{2}\}+...+\{\frac{kn}{n}\}}{n} & = & \frac{\sum_{i=1}^k\frac{kn}{i}-\sum_{i=1}^n[\frac{kn}{i}]}{n}\\ & = & k(lnn+\gamma +\epsilon_n)-\frac{\sum_{i=1}^{kn}[\frac{kn}{i}]-\sum_{i=n+1}^{kn}[\frac{kn}{i}]}{n} \end{array}


Now we try to estimate

\displaystyle S_k(n)=\sum_{i=1}^{kn}[\frac{kn}{i}]-\sum_{i=n+1}^{kn}[\frac{kn}{i}] \ \ \ \ \ (10)

In fact, we have,

\displaystyle \begin{array}{rcl} S_k(n) & = & (2\sum_{i=1}^{[\sqrt{kn}]}[\frac{kn}{i}]-[\sqrt{kn}]^2)-(\sum_{i=1}^k[\frac{kn}{i}]-kn)\\ & = & 2\sum_{i=1}^{[\sqrt{kn}]}\frac{kn}{i}-\sum_{i=1}^k\frac{kn}{i}+2\{\sqrt{kn}\}[\sqrt{kn}]+\{\sqrt{kn}\}^2-2\sum_{i=1}^{[\sqrt{kn}]}\{\frac{kn}{i}\}+\sum_{i=1}^k\{\frac{kn}{i}\}\\ & = & 2kn(ln[\sqrt{kn}]+\gamma+\epsilon_{[\sqrt{kn}]})-kn\sum_{i=1}^k\frac{1}{i}+r(n)\\ & = & knln(kn)+kn(2\gamma-\sum_{i=1}^k\frac{1}{i})+r'(n)\\ & = & knln n+kn(2\gamma+lnk-\sum_{i=1}^k\frac{1}{i})+r'(n) \end{array}

Where {-3\sqrt{n}<r(n)<3\sqrt{n}}, {-3\sqrt{n}<r'(n)<3\sqrt{n}}.

So by 1 we know,

\displaystyle \begin{array}{rcl} \frac{\{\frac{kn}{1}\}+...+\{\frac{kn}{n}\}}{n} & = & k(lnn+\gamma+\epsilon_n)-klnn-k(2\gamma+lnk-\sum_{i=1}^k\frac{1}{i})+\frac{r'(n)}{n}\\ & = & k(\sum_{i=1}^k\frac{1}{i}-lnk-\gamma)+\frac{r'(n)}{n}+k\epsilon_n \end{array}

So we have,

\displaystyle \lim_{n\rightarrow \infty}\frac{\{\frac{kn}{1}\}+...+\{\frac{kn}{n}\}}{n} =k(\sum_{i=1}^k\frac{1}{i}-lnk-\gamma)=k\epsilon_k \ \ \ \ \ (11)

Remark 6 In fact we can get {0<k\epsilon_k<\frac{1}{2}, \forall k\in {\mathbb N}}, by combining the theorem 3 and 1.

3. Lattice points in ball

Gauss use the cube packing circle get a rough estimate,

\displaystyle \sum_{n\leq x}r_2(n)=\pi x+O(\sqrt{x}) \ \ \ \ \ (12)


In the same way one can obtain,

\displaystyle \sum_{n\leq x}r_k(n)=\rho_kx^{\frac{k}{2}}+O(x^{\frac{k-1}{2}}) \ \ \ \ \ (13)

Remark 7 Where {\rho_k=\frac{\pi^{\frac{k}{2}}}{\Gamma(\frac{k}{2}+1)}} is the volume of the unit ball in {k} dimension.

Dirchlet’s hyperbola method works nicely for the lattic points in a ball of dimension {k\geq 4}. Langrange proved that every natural number can be represented as the sum of four squares, i.e. {r_4(n)>0}, and Jacobi established the exact formula for the number of representations

\displaystyle r_4(n)=8(2+(-1)^n)\sum_{d|n,d\ odd}d. \ \ \ \ \ (14)

Hence we derive,

\displaystyle \begin{array}{rcl} \sum_{n\leq x}r_4(n) & = & 8\sum_{m\leq x}(2+(-1)^m)\sum_{dm\leq x, d\ odd}d\\ & = & 8\sum_{m\leq x}(2+(-1)^m)(\frac{x^2}{4m^2}+O(\frac{x}{m}))\\ & = & 2x^2\sum_1^{\infty}(2+(-1)^m)m^{-2}+O(xlogx)\\ & = & 3\zeta(2)x^2+O(xlogx) = \frac{1}{2}(\pi x)^2+O(xlogx) \end{array}

This result extend easily for any {k\geq 4}, write {r_k} as the additive convolution of {r_4} and {r_{k-4}}, i.e.

\displaystyle r_k(n)=\sum_{0\leq t\leq n}r_4(t)r_{k-4}(n-t) \ \ \ \ \ (15)

Apply the above result for {r_4} and execute the summation over the remaining {k-4} squares by integration.

\displaystyle \sum_{n\leq x}r_k(n)=\frac{(\pi x)^{\frac{k}{2}}}{\Gamma(\frac{k}{2}+1)}+O(x^{\frac{k}{2}-1}logx) \ \ \ \ \ (16)


Remark 8 Notice that this improve the formula 12 which was obtained by the method of packing with a unit square. The exponent {\frac{k}{2}-1} in 16 is the best possible because the individual terms of summation can be as large as the error term (apart from {logx}), indeed for {k=4} we have {r_4(n)\geq 16n} if {n} is odd by the Jacobi formula. The only case of the lattice point problem for a ball which is not yet solved (i.e. the best possible error terms are not yet established) are for the circle({k=2}) and the sphere ({k=3}).

Theorem 4

\displaystyle \sum_{n\leq x}\tau(n^2+1)=\frac{3}{\pi}xlogx+O(x) \ \ \ \ \ (17)

4. Application in finite fields

Suppose {f(x)\in {\mathbb Z}[x]} is a irreducible polynomial. And for each prime {p}, let

\displaystyle \rho_f(p)=\# \ of \ solutions\ of f(x)\equiv 0(mod\ p) \ \ \ \ \ (18)

By Langrange theorem we know {\rho_f(p)\leq deg(f)}. Is there a asymptotic formula for

\displaystyle \sum_{p\leq x}\rho_f(p)? \ \ \ \ \ (19)

A general version, we can naturally generated it to algebraic variety.

\displaystyle \rho_{f_1,...,f_k}(p)=\#\ of \ solutions\ of f_i(x)\equiv 0(mod\ p) ,\ \forall 1\leq i\leq k \ \ \ \ \ (20)

Is there a asymptotic formula for

\displaystyle \sum_{p\leq x}\rho_{f_1,...,f_k}(p)? \ \ \ \ \ (21)

Example 1 We give an example to observe what is involved. {f(x)=x^2+1}. We know {x^2+1\equiv 0 (mod \ p)} is solvable iff {p\equiv 1 (mod\ 4)} or {p=2}. One side is easy, just by Fermat little theorem, the other hand need Fermat descent procedure, which of course could be done by Willson theorem. In this case,

\displaystyle \sum_{p\leq n}\rho_f(p)=\# \ of \{primes \ of \ type\ 4k+1 \ in \ 1,2,...,n\} \ \ \ \ \ (22)

Which is a special case of Dirichlet prime theorem.

Let {K} be an algebraic number field, i.e. the finite field extension of rational numbers, let

\displaystyle \mathcal{O}_K=\{\alpha\in K, \alpha \ satisfied \ a\ monic \ polynomial\ in\ {\mathbb Z}[x]\} \ \ \ \ \ (23)


Dedekind proved that,

Theorem 5

  1. {\mathcal{O}_K} is a ring, we call it the ring of integer of {K}.
  2. He showed further every non-zero ideal of {\mathcal{O}_K} could write as the product of prime ideal in {\mathcal{O}_k} uniquely.
  3. the index of every non-zero ideal {I} in {\mathcal{O}_K} is finite, i.e. {[\mathcal{O}_K:I]<\infty}, and we can define the norm induce by index.

    \displaystyle N(I):=[\mathcal{O}_K:I] \ \ \ \ \ (24)

    Then the norm is a multiplication function in the space of ideal, i.e. {N(IJ)=N(I)N(J), \forall I,J \in \ ideal\ class\ group\ of\ \mathcal{O}_K}.

  4. Now he construct the Dedekind Riemann zeta function,

    \displaystyle \zeta_K(s)=\sum_{N(I)\neq 0}\frac{1}{N(I)^s}=\prod_{J\ prime \ ideal\ }\frac{1}{1-\frac{1}{N(J)^s}},\ \forall Re(s)>1 \ \ \ \ \ (25)


Now we consider the analog of the prime number theorem. Let {\pi_K(x)=\{I,N(I)<x\}}, does the exist a asymptotic formula,

\displaystyle \pi_K(x)\sim \frac{x}{ln x}\ as\ x\rightarrow \infty? \ \ \ \ \ (26)

Given a prime {p}, we may consider the prime ideal

\displaystyle p\mathcal{O}_K=\mathfrak{P}_1^{e_1}\mathfrak{P}_2^{e_2}...\mathfrak{P}_k^{e_k} \ \ \ \ \ (27)

Where {\mathfrak{P}_i } is different prime ideal in {\mathcal{O}_K}. But the question is how to find these {\mathfrak{P}_i}? For the question, there is a satisfied answer.

Lemma 6 (existence of primitive element) There always exist a primetive elements in {K}, such that,

\displaystyle K={\mathbb Q}(\theta) \ \ \ \ \ (28)

Where {\theta} is some algebraic number, which’s minor polynomial {f(x)\in {\mathbb Z}[x]}.

Theorem 7 (Dedekind recipe) Take the polynomial {f(x)}, factorize it in the polynomial ring {{\mathbb Z}_p[x]},

\displaystyle f(x)\equiv f_1(x)^{e_1}...f_{r}(x)^{e_r}(mod \ p) \ \ \ \ \ (29)

Consider {\mathfrak{P}_i=(p, f_i(\theta)) \subset \mathcal{O}_K}. Then apart from finite many primes, we have,

\displaystyle p\mathcal{O}_K=\mathfrak{P}_1^{e_1}\mathfrak{P}_2^{e_2}...\mathfrak{P}_k^{e_k} \ \ \ \ \ (30)

Where {N(\mathfrak{P}_i)=p^{deg{f_i}}}.

Remark 9 The apart primes are those divide the discriminant.

Now we can argue that 4 is morally the same as counting the ideals whose norm is divide by {p} in a certain algebraic number theory.

And we have following, which is just the version in algebraic number fields of 2.

Theorem 8 (Weber) {\#} of ideals of {\mathcal{O}_K} with norm {\leq x} equal to,

\displaystyle \rho_k(X)+O(x^{1-\frac{1}{d}}), where \ d=[K:Q] \ \ \ \ \ (31)


Diophantine approximation

I explain some general ideal in the theory of diophantine approximation, some of them is original by myself, begin with a toy model, then consider the application on folklore Swirsing-Schmidt conjecture.


1. Dirichlet theorem, the toy model

The very basic theorem in the theory of Diophantine approximation is the well known Dirichlet approximation theorem, the statement is following.

Theorem 1 (Dirichlet theorem) for all {\alpha} is a irrational number, we have infinity rational number {\frac{q}{p}} such that:

\displaystyle |\alpha-\frac{q}{p}|<\frac{1}{p^2} \ \ \ \ \ (1)

Remark 1 It is easy to see the condition of irrational is crucial. There is a best constant version of it, said, instead of {1}, the best constant in the suitable sense for the theorem 1 should be {\frac{1}{\sqrt{5}}} and arrive by {\frac{\sqrt{5}+1}{2}} at least. The strategy of the proof of the best constant version involve the Frey sequences.

Now we begin to explain the strategies to attack the problem.

\paragraph{Argument 1, boxes principle} We begin with a easiest one, i.e. by the argument of box principle, the box principle is following,

Theorem 2 (Boxes principle) Given {n\in {\mathbb N}} and two finite sets {A={a_1,a_2,...,a_n,a_{n+1}}}, set {B={b_1,...,b_{n}}}, if we have a map:

\displaystyle f:A\longrightarrow B \ \ \ \ \ (2)

Then there exists a element {b_k\in B} such that there exist at least two element {a_i,a_j\in A}, {f(a_i)=f(a_j)=b_k}.

Proof: The proof is trivial. \Box

Now consider, {\forall N\in {\mathbb N}}, the sequences {x,2x,...,Nx}, then {\{ix\}\in [0,1], \forall i\in \{1,2,...,n\}}. Divide {[0,1]} in an average way to {N} part: {[\frac{k-1}{N},\frac{k}{N}]}. Then the linear structure involve (which, in fact play a crucial role in the approach). And the key point is to look at {\{nx\}} and integers.

\paragraph{Argument 2, continue fractional} We know, for irrational number {x}, {x} have a infinite long continue fractional:

\displaystyle x=q_0+\frac{1}{q_1+\frac{1}{q_2+\frac{1}{q_3+....+\frac{1}{q_k+...}}}} \ \ \ \ \ (3)


\displaystyle |x-q_0+\frac{1}{q_1+\frac{1}{q_2+\frac{1}{q_3+....+\frac{1}{q_k}}}}|\sim \frac{1}{(q_1q_2...q_{k-1})^2q_k} \ \ \ \ \ (4)

And we have,

\displaystyle \frac{1}{q_1+\frac{1}{q_2+\frac{1}{q_3+....+\frac{1}{q_k}}}}=\frac{a_n}{b_n}, (a_n,b_n)=1 \ \ \ \ \ (5)

Then {b_n=O(q_1...q_k)}.

\paragraph{Argument 3, Bohr set argument} We begin with some kind of Bohr set:

\displaystyle B_p=I-\cup_{q\in \{0,1,...,p-1\}}(\frac{q}{p}-\frac{1}{p^2},\frac{q}{p}+\frac{1}{p^2}) \ \ \ \ \ (6)

The key point is the shift of Bohr set, on the vertical line i.e. {|B_p\cap B_{p+1}|} is very slow, and can be explained by

\displaystyle \frac{k}{p+1}+\frac{1}{(p+1)^2}>\frac{k}{p}-\frac{1}{p^2} \ \ \ \ \ (7)


\displaystyle \frac{1}{p^2}+\frac{1}{(p+1)^2}>\frac{k}{p(p+1)} \ \ \ \ \ (8)

in {|B_p \cap B_{p+1}|\sim \frac{1}{p(p+1)}} But in fact they are not really independent, as the number of Bohr sets increase, then you can calculate the correlation, thanks to the harmonic sires increasing very slowly, wwe can get something non trivial by this argument, but it seems not enough to cover the whole theorem 1.

\paragraph{Argument 4, mountain bootstrap argument} This argument is more clever than 3, although both two arguments try to gain the property we want in 1 from investigate the whole space {[0,1]} but not {x}, this argument is more clever.

Now I explain the main argument, it is nothing but sphere packing, with the set of balls

\displaystyle \Omega=\{B_{p,q}:=(\frac{q}{p}-\frac{1}{q^2},\frac{q}{p}+\frac{1}{p^2})| \forall p\in {\mathbb N}, 1\leq q\leq p-1 \} \ \ \ \ \ (9)

and define its subset

\displaystyle \Omega_l=\{B_{p,q}:=(\frac{q}{p}-\frac{1}{q^2},\frac{q}{p}+\frac{1}{p^2})| \forall 1\leq p\leq l, 1\leq q\leq p-1 \} \ \ \ \ \ (10)

Then {\Omega_l\subset \Omega}, and {\Omega =\cup_{l\in {\mathbb N}}\Omega_l}. If we can proof,

Lemma 3 For all {l\in {\mathbb N}}, there is a subset {A_l} of {\Omega-\Omega_l} such that {\cup_{i\in A_l}B_i=[0,1]}.

Remark 2 If we can proof 3, it is easy to see the theorem 1 follows.

Proof: The proof follows very standard in analysis, may be complex analysis? Key point is we start with a ball {B_{p,q}}, whatever it is, this is not important, the important thing is we can take some ball {B_{p',q'}} with the center of {B_{p',q'}} in {B_{p,q}}, then try to consider {B_{p',q'}\cup B_{p,q}} to extension {B_{p,q}} and then we find the boudary is also larger then we can extension again, step by step just like mountain bootstrap argument. So we involve in two possible ending,

  1. The extension process could extension {B{p,q}} to whole space.
  2. we can not use the extension argument to extension to the whole space.

If we are in the first situation, then we are safe, there is nothing need proof. If we are in second case, anyway we take a ball {B_{p,q}=(\frac{q}{p}-\frac{1}{p^2},\frac{q}{p}+\frac{1}{p^2})}. Then try to find good ball {B_{p',q'}} to approximate {B_{p,q}}, but this is difficult… \Box

Remark 3 Argument 1 is too clever to be true in generalization, argument 2 is standard, by the power of renormalization. argument 3 and argument 4 have gap… I remember I have got a proof similar to argument 4 here many years ago, but I forgot how to get it…

2. Schimidt conjecture

The Schimidt conjecture could be look as the generalization of Dirchlet approximation theorem 1 to algebraic number version, to do this, we need define the height of a algebraic number.

Definition 4 We say a number {\alpha\in {\mathbb C}} is a {k-}order algebraic number if and only is the minimal polynomial of {\alpha}, {f(x)=a_nx^n+...+a_1x+a_0, a_n\neq 0} have degree {deg(f)=n, f\in {\mathbb Z}[x]}.

Definition 5 (Height) Now we define the height of a {k-}th order algebraic number as {H(\alpha):=\max\{\|a_n\|_h,\|a_{n-1}\|_h,...,\|a_0\|_h\}}, Where

\displaystyle h(a_i)=\|a_i\|_{\infty} \ \ \ \ \ (11)

Now we state the conjecture:

Theorem 6 (Swiring-Schimidt conjecture) For all transendental number {x\in {\mathbb C}}, there is infinitely {\alpha} are {k-}th algebraic number such that:

\displaystyle |x-\alpha|<\frac{c_k}{H(\alpha)^{k+1}} \ \ \ \ \ (12)

Where {c_k} is a constant only related to {k} but not {x}.

I point out the conjecture is very related to the map:

\displaystyle F:(x_1,...,x_n) \longrightarrow (\sigma_1(x_1,...,x_n),\sigma_2(x_1,...,x_n),...,\sigma_n(x_1,...,x_n)) \ \ \ \ \ (13)

Where {\sigma_k(x_1,...,x_n)=\sum_{1\leq i_1<...<i_k\leq n}\Pi_{j=1}^kx_{i_1}x_{i_2}...x_{i_k}} is the {k-}th symmetric sum.

Remark 4 {F} is a map {{\mathbb C}^n\rightarrow {\mathbb C}^n}, what we consider is its inverse, {G=F^{-1}}, but {G} is not smooth, it occur singularity when {x_i=x_j} for some {i\neq j}. And the map, as we know, the singularity depend on the quantity {\Pi_{1\leq i< j\leq n}(x_i-x_j)}.

Remark 5 I then say something about the geometric behaviour of the map {G}, as we know, what we have in mind is consider the map {G} as a distortion {{\mathbb C}^n\rightarrow {\mathbb C}^n}, Then {H(\alpha)} is just the pullback of the canonical metric on {{\mathbb C}}(morally) to {{\mathbb C}}.

Discrete harmonic function in Z^n

There is some gap, in fact I can improve half of the argument of Discrete harmonic function , the pdf version is Discrete harmonic function in Z^n, but I still have some gap to deal with the residue half…


1. The statement of result

First of all, we give the definition of discrete harmonic function.

Definition 1 (Discrete harmonic function) We say a function {f: {\mathbb Z}^n \rightarrow {\mathbb R}} is a discrete harmonic function on {{\mathbb Z}^n} if and only if for any {(x_1,...,x_n)\in {\mathbb Z}^n}, we have:

\displaystyle f(x_1,...,x_n)=\frac{1}{2^n}\sum_{(\delta_1,...,\delta_n )\in \{-1,1\}^n}f(x_1+\delta_1,...,x_n+\delta_n ) \ \ \ \ \ (1)


In dimension 2, the definition reduce to:

Definition 2 (Discrete harmonic function in {{\mathbb R}^2}) We say a function {f: {\mathbb Z}^2 \rightarrow {\mathbb R}} is a discrete harmonic function on {{\mathbb Z}^2} if and only if for any {(x_1,x_2)\in {\mathbb Z}^2}, we have:

\displaystyle f(x_1,x_2)=\frac{1}{4}\sum_{(\delta_1,\delta_2)\in \{-1,1\}^2}f(x_1+\delta_1,x_2+\delta_2 ) \ \ \ \ \ (2)


The result establish in \cite{paper} is following:

Theorem 3 (Liouville theorem for discrete harmonic functions in {{\mathbb R}^2}) Given {c>0}. There exists a constant {\epsilon>0} related to {c} such that, given a discrete harmonic function {f} in {{\mathbb Z}^2} satisfied for any ball {B_R(x_0)} with radius {R>R_0}, there is {1-\epsilon} portion of points {x\in B_R(x_0)} satisfied {|f(x)|<c}. then {f} is a constant function in {{\mathbb Z}^2}.

Remark 1 This type of result contradict to the intuition, at least there is no such result in {{\mathbb C}}. For example. the existence of poisson kernel and the example given in \cite{paper} explain the issue.

Remark 2 There are reasons to explain why there could not have a result in {{\mathbb C}} but in {{\mathbb Z}^2},

  1. The first reason is due to every radius {R} there is only {O(R^2)} lattices in {B_R(x)} in {{\mathbb Z}^2} so the mass could not concentrate very much in this setting.
  2. The second one is due to there do not have infinite scale in {{\mathbb Z}^2} but in {{\mathbb C}}.
  3. The third one is the function in {{\mathbb Z}^2} is automatically locally integrable.


The generation is following:

Theorem 4 (Liouville theorem for discrete harmonic functions in {{\mathbb R}^n}) Given {c>0,n\in {\mathbb N}}. There exists a constant {\epsilon>0} related to {n,c} such that, given a discrete harmonic function {f} in {{\mathbb Z}^n} satisfied for any ball {B_R(x_0)} with radius {R>R_0}, there is {1-\epsilon} portion of points {x\in B_R(x_0)} satisfied {|f(x)|<c}. then {f} is a constant function in {{\mathbb Z}^n}.

In this note, I give a proof of 4, and explicit calculate a constant {\epsilon_n>0} satisfied the condition in 3, this way could also calculate a constant {\epsilon_n} satisfied 4. and point the constant calculate in this way is not optimal both in high dimension and 2 dimension.

2. some element properties with discrete harmonic function

We warm up with some naive property with discrete harmonic function. The behaviour of bad points could be controlled, just by isoperimetric inequality and maximum principle we have following result.

Definition 5 (Bad points) We divide points of {{\mathbb Z}^n} into good part and bad part, good part {I} is combine by all point {x} such that {|f(x)|<c}, and {J} is the residue one. So {A\amalg B={\mathbb Z}^n}.

For all {B_R(0)}, we define {J_R:=J\cap B_R(0), I_R=I\cap B_R(0)} for convenient.

Theorem 6 (The distribution of bad points) For all bad points {J_R} in {B_R(0)}, they will divide into several connected part, i.e.

\displaystyle J_R=\amalg_{i\in S_R}A_i \ \ \ \ \ (3)

and every part {A_i} satisfied {A_i\cap \partial B_R(0)\neq \emptyset}.

Remark 3 We say {A} is connected in {{\mathbb Z}^n} iff there is a path in {A} connected {x\rightarrow y, \forall x,y\in A}.

Remark 4 the meaning that every point So the behaviour of bad points are just like a tree structure given in the gragh.

Proof: A very naive observation is that for all {\Omega\subset {\mathbb Z}^n} is a connected compact domain, then there is a function

\displaystyle \lambda_{\Omega}: \partial \longrightarrow {\mathbb R} \ \ \ \ \ (4)

such that {\lambda_{\Omega}(x,y)\geq 0, \forall (x,y)\in \Omega \times \mathring{\Omega}}. And we have:

\displaystyle f(x)=\sum_{y\in\partial \Omega}\lambda_{\Omega}(x,y)(y) \ \ \ \ \ (5)


This could be proved by induction on the diameter if {\Omega}. Then, if there is a connected component of {\Omega} such that contradict to theorem 6 for simplify assume the connected component is just {\Omega}, then use the formula 5we know

\displaystyle \begin{array}{rcl} \sup_{x\in \Omega}|f(x)| & = & \sup_{x\in \Omega}\sum_{y\in\partial \Omega}\lambda_{\Omega}(x,y)(y) \\ & \leq & \sup_{\partial \Omega}|f(x)| \\ & \leq & c \end{array}

The last line is due to consider around {\partial \Omega}. But this lead to: {\forall x\in \Omega, |f(x)|<c} which is contradict to the definition of {\Omega}. So we get the proof. \Box

Now we begin another observation, that is the freedom of extension of discrete harmonic function in {{\mathbb Z}^n} is limited.

Theorem 7 we can say something about the structure of harmonic function space of {Z^n}, the cube, you will see, if add one value, then you get every value, i.e. we know the generation space of {Z^n}

Proof: For two dimension case, the proof is directly induce by the graph. The case of {n} dimensional is similar. \Box

Remark 5 The generation space is well controlled. In fact is just like n orthogonal direction line in n dimensional case.

3. sktech of the proof for \ref


The proof is following, by looking at the following two different lemmas establish by two different ways, and get a contradiction.

\paragraph{First lemma}

Lemma 8 (Discrete poisson kernel) the poisson kernel in {{\mathbb Z}^n}. We point out there is a discrete poisson kernel in {{\mathbb Z}^n}, this is given by:

\displaystyle f(x)=\sum_{y\in \partial B_R(z)}\lambda_{B_R(z)}(x,y)(y) \ \ \ \ \ (6)

And the following properties is true:

  1. {\lambda_{B_R(z+h)}(x+h,y+h)=\lambda_{B_R(z)}(x,y)} , {\forall x\in \Omega, h\in {\mathbb Z}^n}.
  2. \displaystyle \lambda_{B_R(z)}(x,y)\rightarrow \rho_R(x,y) \ \ \ \ \ (7)

Remark 6 The proof could establish by central limit theorem, brown motion, see the material in the book of Stein \cite{stein}. The key point why this lemma 8 will be useful for the proof is due to this identity always true {\forall x\in B_R(0)}, So we will gain a lots of identity, These identity carry information which is contract by another argument.

\paragraph{Second lemma} The exponent decrease of mass.

Lemma 9 The mass decrease at least for exponent rate.

Remark 7 the proof reduce to a random walk result and a careful look at level set, reduce to the worst case by brunn-minkowski inequality or isoperimetry inequality.

\paragraph{Final argument} By looking at lemma 1 and lemma 2, we will get a contradiction by following way, first the value of {f} on {\partial B_R(0)} increasing too fast, exponent increasing by lemma2, but on the other hand, it lie in the integral expresion involve with poisson kernel, but the pertubation of poisson kernel is slow, polynomial rate in fact…



\bibitem{stein} Functional analysis


Log average sarnak conjecture


This is a note concentrate on the log average Sarnak conjecture, after the work of Matomaki and Raziwill on the estimate of multiplication function of short interval. Given a overview of the presented tools and method dealing with this conjectue.


1. Introduction

Sarnak conjecture \cite{Sarnak} assert that for any obersevable {\{f(T^n(x_0))\}_{n=1}^{\infty}} come from a determination systems {(T,X),T:X\rightarrow X}, where {h(T)=0}, {x_0\in X, f\in C(X)}. The correlation of it and the Liuvillou function is 0, i.e. they are orthongonal to each other, more preseicesly it is just to say,

\displaystyle \sum_{n<x}\mu(x)f(T^n(x_0))=o(x) \ \ \ \ \ (1)


This is a very natural raised conjecture, Liuville function is the presentation of primes, due to we always believe the distribution of primes in {\mathbb N} should be randomness.

It has been known as observed by Landau \cite{Laudau} that the simplest case,

\displaystyle \sum_{n<x}\mu(n)=o(x)

already equivalent to the prime number theorem. It is not difficult to deduce the spetial case of Sarnak conjecture when with the obersevation in $latex {(1)}&fg=000000$ come from finite dynamic system is equivalent to the prime number theorem in athremetic progress by the similar argument. Besides this two classical result, may be the first new result was established by Davenport,

Theorem 1 Let {T:S_1\rightarrow S_1, T(x)=x+\alpha}, {\alpha} is a inrational, then the obersevation come from {(T,S_1)} is orthogonal to Mobius function. due to {\{e^{2\pi ikx}\}_{k\in \mathbb Z}} is a basis of {C(S_1)}, suffice to proof,

\displaystyle \sum_{n<x}e^{2\pi ikn\alpha}\mu(n)=o(x), \forall k\in \mathbb N

There is a lots of spetial situations of Sarnak’s conjecture have been established, The parts I mainly cared is the following:

  1. Interval exchange map.
  2. Skew product flow.
  3. Obersevable come from One dimensional zero entropy flow.
  4. Nilsequences.

But in this note, I do not want to explain the tecnical and tools to establish this result, but considering an equivalent conjecture of Sarnak conjecture, named Chowla conjecture, and explain the underlying insight of the suitable weak statement, i.e. the log average Chowla conjecture and the underlying insight of it.

The note is organized as following way, in the next section $latex {(2)}&fg=000000$, we give a self-contained introduction on the tools called Bourgain-Sarnak-Ziegler critation, explain the relationship of this critation and the sum-product phenomenon, also given some more general critation along the philosephy use in establish the Bourgain-Sarnak-Ziegler critation, which maybe useful in following development combine with some other tools. The key point is transform the sum from linear sum to bilinear sum and decomposition the bilinear sum into diagonal part and off-diagonal part, use the assume in the critation to argue the off-diagonal part is small and on the orther hand the diagonal part is also small by the trivial estimate and the volume of diogonal is small, this is very similar to a suitable Caderon-Zugmund decomposition.

In section $latex {(4)}&fg=000000$, I try to give a proof sketch of the result of Matomaki and Raziwill, which is also a key tools to understanding the Sarnak conjecture, or equivalent the Chowla conjecture. The key points of the proof contains following:

  1. Find a suitable fourier indentity
  2. Construct a multiplication-addition dense subset {S}, and proof that the theorem MR hold we need only to proof it hold for {S\cap [1,2,...,n]} instead of {[1,2,...,n]}
  3. Involve the power of euler product formula. divide the whole interval into a lot of small interval with smaller and smaller scale and a residue part. We look the part come from every small scale as a major term and look the residue part as minor term.
  4. Deal with the major term at every scale, by a combitorios identity and second moments method.
  5. find a enough decay estimate from a scale to the next smaller scale.
  6. Deal with the minor term by the H… lemma.

Due to the theorem of MR do not exausted the method they developed, we trying to make some more result with their method, Tao and Matomaki attain the average version of Chowla conjecture is true by this way, and combine this argument and the entropy decresment argument they established the 2 partten of the log average Chowla conjecture is true. Very recently Tao and his coperator proved the odd partten case of log average chowla conjecture is true, combine an argument of frustenberg crresponding principle and entopy decresment argument. But it seems the even and large than 2 case is much difficult and seems need something new to combine with the method of MR and entropy decresment and frunstenberg corresponfing principle to make some progress.

So, in section $latex {(5)}&fg=000000$, we give a self-contain introduction to the entropy decresment argument of Tao, and combine with the frustenberg corresponding principle.

In the last section $latex {(6)}&fg=000000$, I state some result and method and phylosphy of them I get on nilsequences and wish to combine them with the previous method to make some progress on log average Chowla conjecture on the even partten case.


2. Bourgain-Sarnak-Zieglar creation

We begin with the easiest one, this is the main result established in \cite{BSZ}, I try to give the main ideal under the proof, but with a no quantitative version is the following,

Theorem 2 (Bourgain-Sarnak-Zieglar creation, not quantitative version) if for all primes {p,q>>1} we have:

\displaystyle \sum_{n=1}^Nf(T^{pn}(x))\overline{f(T^{qn}(x))}=o(N) \ \ \ \ \ (2)


Then for multiplication function {g(n)} we have

\displaystyle \sum_{n=1}^Ng(n)\overline{ f(T^n(x))}=o(N) \ \ \ \ \ (3)


Remark 1 For simplify we identify {f(T^n(x)):=F(n)}.

Remark 2

The idea is following, break the sum into a bilinear one, so, of course, we multiplication it with itself. i.e. we consider to control,

\displaystyle |\sum_{i=1}^Ng(n)\overline{ F(n)}|^2=\sum_{n=1}^N\sum_{m=1}^Ng(n)g(m)\overline{F(n)F(m)} \ \ \ \ \ (4)


To control 4, we need exhausted the mutiplication property of {g(n)}, we have {g(mn)=g(n)g(m),\forall\ m,n\in {\mathbb N}}. We can not get good estimate for all term,

\displaystyle g(n)g(m)\overline{F(n)F(m)} \ \ \ \ \ (5)

The condition in our hand if following,

\displaystyle \sum_{n=1}^NF(pn)\overline{F(qn)}=o(N), \forall \ p,q\in \mathop{\mathbb P} \ \ \ \ \ (6)

So, just like the situation of Cotlar-Stein lemma \cite{Cotlar-Stein lemma}, we wish to estimate like following:

\displaystyle \begin{array}{rcl} |\sum_{p\in W}\sum_{n\in V}F(pn)g(pn)| & \leq & \sum_{n\in V}|g(n)|\cdot |\sum_{p\in W}F(pn)g(p)| \\ & \leq &\sum_{n\in V}|\sum_{p \in W}F(pn)g(p)|\\ & \overset{Cauchy-Schwarz}\leq & |V|^{\frac{1}{2}}[\sum_{n\in V}|\sum_{p\in W}F(pn)g(p)|^2]^{\frac{1}{2}}\\ & = & |V|^{\frac{1}{2}}[\sum_{p_1,p_2\in W}\sum_{n\in V}F(p_1n)\overline{F(p_2n)}g(p_1)\overline{g(p_2)}]^{\frac{1}{2}}\\ \end{array}

Then we consider divide the sum into diagonal part and non-diagonal part, as following,

\displaystyle |V|^{\frac{1}{2}}[\sum_{p_1\neq p_2\in W}\sum_{n\in V}F(p_1n)\overline{F(p_2n)}g(p_1)\overline{g(p_2)}]^{\frac{1}{2}}+|V|^{\frac{1}{2}}[\sum_{p\in W}\sum_{n\in V}|F(pn)|^2]^{\frac{1}{2}} \ \ \ \ \ (7)

But the first part is small, i.e.

\displaystyle |V|^{\frac{1}{2}}[\sum_{p_1\neq p_2\in W}\sum_{n\in V}F(p_1n)\overline{F(p_2n)}g(p_1)\overline{g(p_2)}]^{\frac{1}{2}} =o(|W||V|) \ \ \ \ \ (8)

Because of

\displaystyle \sum_{n\in V}F(p_1n)\overline{F(p_2n)}=o(V), \forall p_1\neq p_2\in W \ \ \ \ \ (9)

and the second part is small, i.e.

\displaystyle |V|^{\frac{1}{2}}[\sum_{p\in W}\sum_{n\in V}|F(pn)|^2]^{\frac{1}{2}}=o(|W||V|) \ \ \ \ \ (10)

Because diagonal part is small in {W\times W} and trivial inequality

\displaystyle \sqrt{\sum_{n\in V}|F(pn)|^2}\leq |V|^{\frac{1}{2}} \ \ \ \ \ (11)

But the method in remark 2 is not always make sense in any situation, we need to construct two suitable sets {W,V} and then break up {\{1,2,...,n{\mathbb N}\}} into {W\times V}, this mean,

\displaystyle \{1,2,...,N\}\sim W\times V+o(N) \ \ \ \ \ (12)

But this {W,V} could be construct in this situation, thanks to the prime number theorem,

Theorem 3 (Prime number theorem)

\displaystyle \pi(n)\sim \frac{n}{ln(n)} \ \ \ \ \ (13)

Morally speaking, this is the statement that the primes, which is the generator of multiplication function, is not very sparse.

3. Van der curpurt trick

There is the statement of Van der carport theorem:

Theorem 4 (Van der curpurt trick) Given a sequences { \{x_n\}_{n=1}^{\infty}} in { S_1}, if { \forall k\in N^*}, { \{x_{n+k}-x_n\}} is uniformly distributed, then { \{x_n\}_{n=1}^{\infty}} is uniformly distributed.

I do not know how to establish this theorem with no extra condition, but this result is true at least for polynomial flow. \newpage Proof:

\displaystyle \begin{array}{rcl} |\sum_{n=1}^Ne^{2\pi imQ(n)}|& = &\sqrt{(\sum_{n=1}^Ne^{2\pi imQ(n)})(\overline{\sum_{n=1}^Ne^{2\pi imQ(n)}})}\\ & = &\sqrt{\sum_{h_1=1}^N\sum_{n=1}^{N-h_1}e^{2\pi imQ(n+h_1)-Q(n)}}\\ & = &\sqrt{\sum_{h_1=1}^N\sum_{n=1}^{N-h_1}e^{2\pi im \partial^1_{h_1}Q(n)}}\\ & \leq & \sqrt{\sum_{h_1=1}^N|\sum_{n=1}^{N-h_1}e^{2\pi \partial^1_{h_1}Q(n)}|}\\ & = &\sqrt{\sum_{h_1=1}^N\sqrt{ (\sum_{n=1}^{N-h_1}e^{2\pi \partial^1_{h_1}Q(n)} )(\overline{\sum_{n=1}^{N-h}e^{2\pi \partial^1_{h_1}Q(n)})}}}\leq\sqrt{\sum_{h_1=1}^N\sqrt{ \sum_{h_2=1}^N|\sum_{n=1}^{N-h_1}e^{2\pi\partial^1_{h_2} \partial^1_hQ(n)} |}}\\ & \leq ....\leq & \\ & = & \sqrt{\sum_{h_1=1}^N\sqrt{ \sum_{h_2=1}^N \sqrt{....\sqrt{\sum_{h_{k-1}=1}^{N-h_{k-2}}|\sum_{n=1}^{N-h_{k-1}}e^{2\pi\partial_{h_1h_2...h_{k-1}Q(n)}}|}}}} =o(1) \end{array}


This type of trick could also establish the following result, which could be understand as a discretization of the Vinegradov lemma.

Remark 3

Uniformly distribution result of { F_p}: Given {Q(n)=a_kn^k+...+a_1n+a_0}, {\{Q(0),Q(1),...,Q(p-1)\}} coverages to a uniformly distribution in {\{0,1,...,p-1\}} as {p \rightarrow \infty}.

Remark 4 But I definitely do not know how to establish the similar result when {Q(n)=n^{-1}}.

Remark 5

This trick could also help to establish estimate of correlation of low complexity sequences and multiplicative function, such as result:

\displaystyle S(x)=\sum_{n\le x}\left(\frac{n}{p}\right)\mu(n)=o(n)

Maybe with the help of B-Z-S theorem.


4. Matomaki and Raziwill’s work

In this section we explain the main idea underlying the paper \cite{KAISA MATOMA 虉KI AND MAKSYM RADZIWILL}. But play with a toy model, i.e. the corresponding corollary of the original result on Liouville鈥檚 function.

Definition 5 (Lioville’s function)

\displaystyle \lambda(n)=(-1)^{\alpha_1+\alpha_2+...+\alpha_k}, \forall \ n=p_1^{\alpha_1}...p_k^{\alpha_k}. \ \ \ \ \ (14)

Remark 6

\displaystyle |\int_{X}^{2X}\lambda(n)dx|=o(x) \ \ \ \ \ (15)

is equivalent to the prime number theorem 3.

The most important beakgrouth of analytic number theory is the new understanding of multiplication function on share interval, this result is established by Kaisa Matom盲ki and Maksym Radziwill. Two very young and intelligent superstars.

The main theorem in them article is :

Theorem 6 (Matomaki,Radziwill) As soon as {H\rightarrow \infty} when {x\rightarrow \infty}, one has:

\displaystyle \sum_{x\leq n\leq x+H}\lambda(n)= o(H) \ \ \ \ \ (16)

for almost all {1\leq x\leq X} .

In my understanding of the result, the main strategy is:

  1. Parseval indetity, transform to Dirchelet polynomial.
  2. Involved by multiplication property, spectral decomposition.
  3. From linear to multilinear , Cauchy schwarz inequality.
  4. major term estimate.
  5. Estimate the contribution of area which is not filled.

4.1. Parseval indetity, transform to Dirchelet polynomial

We wish to establish the equality,

\displaystyle \frac{1}{X}\int_{X}^{2X}|\sum_{x\leq n\leq x+H}\lambda(n)|dx=o(H) \ \ \ \ \ (17)

This is the {L^1} norm, by Chebyschev inequality, this could be control by {L^2} norm, so we only need to establish the following,

\displaystyle \frac{1}{X}\int_X^{2 X}|\sum_{x\leq n\leq x+H}\lambda(n)|^2dx=o(H^2) \ \ \ \ \ (18)


We wish to transform from the discretization sum to a continue sum, that is,

\displaystyle \int_{{\mathbb R}}|\sum_{xe^{-\frac{1}{T}}\leq n\leq xe^{\frac{1}{T}}}\lambda(n)1_{X\leq n\leq 2X}|^2\frac{dx}{x} \ \ \ \ \ (19)


Remark 7 There are two points to understand why 19 and 18 are the same.

  1. {[xe^{-\frac{1}{T}},xe^{\frac{1}{T}}]\sim [x-H,x+H]}.
  2. {1_{x\leq n\leq 2x}} and {\frac{1}{x}} is to make that {x=O(X)}.

So the Magnitude of 18 and 19 are the same. i.e.

\displaystyle \int_{{\mathbb R}}|\sum_{xe^{-\frac{1}{T}}\leq n\leq xe^{\frac{1}{T}}}\lambda(n)1_{X\leq n\leq 2X}|^2\frac{dx}{x}\sim \frac{1}{X}\int_X^{2 X}|\sum_{x\leq n\leq x+H}\lambda(n)|^2dx \ \ \ \ \ (20)

Now we try to transform 19 by Parseval indetity, this is something about the {L^2} norms of the quality we wish to charge. It is just trying to understanding 19 as a quantity in physical space by a more chargeable quality in frequency space. Image,

\displaystyle \int_{{\mathbb R}}|\sum_{xe^{-\frac{1}{T}}\leq n\leq xe^{\frac{1}{T}}}\lambda(n)1_{X\leq n\leq 2X}|^2\frac{dx}{x}:=\int_{{\mathbb R}}|f_X(x)|^2dx \ \ \ \ \ (21)

Then {f_X(x)=\int_{xe^{-\frac{1}{T}}\leq n\leq xe^{\frac{1}{T}}}\lambda(x)1_{X\leq n\leq 2X}}. Note that,

\displaystyle \begin{array}{rcl} \widehat{f_X(\xi)} & = & \int_{{\mathbb R}}f_X(x)e^{2\pi ix\xi}dx\\ & = & \sum_{x\leq n\leq 2x}\lambda(x)\int_{logn-\frac{1}{T}}^{logn+\frac{1}{T}}e^{2\pi ix\xi}dx, \ T=\frac{X}{H}\\ & = & \sum_{X\leq n\leq 2X}\lambda(x)e^{2\pi ilog(n)\cdot \xi}\cdot\frac{e^{2\pi i\frac{\xi}{T}}-e^{2\pi i-\frac{\xi}{T}}}{2\pi i\xi}\\ \end{array}

So by Parseval identity, we have,

\displaystyle \begin{array}{rcl} \int_{{\mathbb R}}|f_X(x)|^2dx & = & \int_{{\mathbb R}}|\widehat{f_X(\xi)}|^2d\xi \\ & = & \int_{{\mathbb R}}|\sum_{X\leq n\leq 2X}\lambda(n)\cdot n^{2\pi i\xi}|^2(\frac{e^{2\pi i\frac{\xi}{T}}-e^{2\pi i\frac{-\xi}{T}}}{2\pi i\xi})^2d\xi\\ & \sim & \int_{{\mathbb R}}|\sum_{X\leq n\leq 2X}\lambda(n)\cdot n^{2\pi i\xi}|^2\frac{1}{T^2}1_{|\xi|^2\leq T}\\ \end{array}

Remark 8 We know the Fejer kernel satisfied,

\displaystyle (\frac{e^{2\pi i\frac{\xi}{T}}-e^{2\pi i\frac{-\xi}{T}}}{2\pi i\xi})^2\sim \frac{1}{T^2}1_{|\xi|\leq T} \ \ \ \ \ (22)

So morally speaking, we get the following identity.

\displaystyle \frac{1}{X}\int_{X}^{2X}|\sum_{x\leq n\leq x+H}\lambda(n)|^2dx\sim \frac{1}{(x/H)^2}\int_{0}^{\frac{X}{H}}|\sum_{x\leq n\leq 2x}\lambda(x)x^{2\pi i\xi}|^2d\xi \ \ \ \ \ (23)

In fact we do a cutoff, the quality we really consider is just:

\displaystyle \frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt \ \ \ \ \ (24)

established the monotonically inequality:

Theorem 7 (Paserval type identity)

\displaystyle \frac{1}{X}\int_{X}^{2X}|\frac{1}{H}\sum_{x\leq n\leq x+H}\lambda(n)|^2dx \sim聽\frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt \ \ \ \ \ (25)


Remark 9

In my understanding, This is a perspective of the quality, due to the quality is a multiplicative function integral on a domain { \mathbb N^*} with additive structure, it could be looked as a lots of wave with the periodic given by primes, so we could do a orthogonal decomposition in the fractional space, try to prove the cutoff is a error term and we get such a monotonically inequality.

But at once we get the monotonically inequality, we could look it as a聽compactification process and this process still carry most of the information so lead to the inequality.

It seems something similar occur in the attack of the moments estimate of zeta function by the second author. And it is also could be looked as something similar to the 聽spectral decomposition with some basis come from multiplication generators, i.e. primes.

4.2. Involved by multiplication property, spectral decomposition

I called it is “spectral decomposition”, but this is not very exact. Anyway, the thing I want to say is that for multiplication function {\lambda(n)}, we have Euler-product formula:

\displaystyle \Pi_{p,prime}(\frac{1}{1-\frac{\lambda(p)}{p^s}})=\sum_{n=1}^{\infty} \frac{\lambda(n)}{n^s} \ \ \ \ \ (26)


But anyway, we do not use the whole power of multiplication just use it on primes, i.e. {\lambda(pn)=\lambda(p)\lambda(n)} leads to following result:

\displaystyle \lambda(n)=\sum_{n=pm,p\in I}\frac{\lambda(p)\lambda(m)}{\# \{p|m, p\in I\}+1}+\lambda(n)1_{p|n;p\notin I} \ \ \ \ \ (27)

This is a identity about the function {\lambda(n)}, the point is it is not just use the multiplication at a point,i.e. {\lambda(mn)=\lambda(m)\lambda(n)}, but take average at a area which is natural generated and compatible with multiplication, this identity carry a lot of information of the multiplicative property. Which is crucial to get a good estimate for the quality we consider about.

4.3. From linear to multilinear , Cauchy schwarz

Now, we do not use one sets {I}, but use several sets {I_1,...,I_n } which is carefully chosen. And we do not consider [X,2X] with linear structure anymore , instead reconsider the decomposition:

{[X,2X]=\amalg_{i=1}^n (I_i\times J_i) \amalg U}

On every {I_i\times J_i} it equipped with a bilinear structure. And {U} is a very small set, {|U|=o(X)} which is in fact have much better estimate.

{\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt =\sum_{i=1}^n\int_{I_i\times J_i}聽聽\frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt +\int_N |\sum_{n\leq X}\lambda(n)n^{it}|^2dt}

Now we just use a Cauchy-Schwarz:

{\sum_{i=1}^n\int_{I_i\times J_i}聽聽\frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt +\int_N |\sum_{n\leq X}\lambda(n)n^{it}|^2dt}

4.4. major term estimate

{=\sum_{i=1}^n\int_{I_i\times J_i}聽聽\frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt}

{\int_N |\sum_{n\leq X}\lambda(n)n^{it}|^2dt}

4.5. estimate the contribution of area which is not filled


5. Entropy dcrement argument


6. Correlation with nilsequences

I wish to establish the following estimate: {\lambda(n)} is the liouville function we wish the following estimate is true.

\displaystyle \int_{0\leq x\leq X}|\sup_{f\in \Omega^m}\sum_{x\leq n\leq x+H}\lambda(n)e^{2\pi if(x)}|dx =o(XH). \ \ \ \ \ (28)

Where we have { H\rightarrow \infty} as { x\rightarrow \infty},

\displaystyle \Omega^m=\{a_mx^m+a_{m-1}x^{m-1}+...+a_1x+a_0 | a_m,...,a_1,a_0\in [0,1]\}

is a compact space.

I do not know how to prove this but this is result is valuable to consider, because by a Fourier identity we could transform the difficulty of (log average) Chowla conjecture to this type of result.

There is some clue to show this type of result could be true, the first one is the result established by Matomaki and Raziwill in 2015:

Theorem 8 (multiplication function in short interval)

{f(n): \mathbb N\rightarrow \mathbb C} is a multiplicative function, i.e. { f(mn)=f(n)f(m), \forall m,n\in \mathbb N}. {H\rightarrow \infty} as {x\rightarrow \infty}, then we have the following result,

\displaystyle \int_{1\leq x\leq X}|\sum_{x\leq n\leq x+H}f(n)|=o(XH). \ \ \ \ \ (29)

And there also exists the result which could be established by Vinagrodov estimate and B-S-Z critation :

Theorem 9 (correlation of multiplication function and nil-sequences in long interval)

{f(n): \mathbb N\rightarrow \mathbb C} is a multiplicative function, i.e. { f(mn)=f(n)f(m), \forall m,n\in \mathbb N}. {g(n)=a_n^m+...+a_1n+a_0} is a polynomial function then we have the following result,

\displaystyle \int_{1\leq n \leq X}|f(n)e^{2\pi i g(n)}|=o(X) \ \ \ \ \ (30)

\newpage {9} \bibitem{Sarnak} Peter Sarnak, Mobius Randomness and Dynamics.

\texttt{https://publications.ias.edu/sites/default/files/Mahler }. \bibitem{Laudau} JA 虂NOS PINTZ (BUDAPEST). LANDAU鈥橲 PROBLEMS ON PRIMES.

\texttt{https://users.renyi.hu/~pintz/pjapr.pdf} \bibitem{BSZ} Knuth: Computers and Typesetting,


\bibitem{Cotlar-Stein lemma} Almost orthogonality





Two stupid question

The story of the infinite dimensional space of $\Delta$ is following, we eliminate ourself with compact smooth non-boundary manifold $M$ with metric $g$, then we have Betrami-Laplace operator $\Delta_g$. We could instead $\Delta_g$ by hodge laplace $dd^*+d^*d$, but let we consider $\Delta_g$ the eigenvalue problem:
$$\Delta_g u=\lambda u$$
A classical way to investigate the eigenvalue problem is according to consider variational principle and max-min principle. We equip the path integral on the function space $C^{\infty}(M)$:
$$E(f)=\frac{\int_M |\nabla u|^2}{\int_M |u|^2 }$$
Then it have a sequences of eigenvalue, negative of course: $$0<-\lambda_1<-\lambda_2<…<\lambda_k<…$$

Then things become interesting, the morse theory of infinite space involve, called the infinite space as $X$, so at least, shrink the far place of $X$ as a point, in physics, this mean, cut off at fix scale. And we can take the scale to infinite small, we use the cutoff one to approximation the real one. What I can do is the following, I can proof the eigenvalue function is uniformly distributed in $L^2(M_g)$ (after rescaling of course) and the classical weyl law(although can not give a good error term estimate), but thing become more complicated when I try to consider the infinite space $X_{M_g}$’s topology, at finite scale at least, i.e. $X_{M_g}^{h}$ which is the cut off at scale $h$. Among the other thing, I believe the following issue is true, but without ability to proof it:

for every manifold $M$ and metric $g$ on $M$, the topology of infinite space $X_{M_g}$ is the same, beside this, the inverse could be true, i.e. If $X_{M_1},X_{M_2}$ is not homomorphism for some scale $h$ then $M_1,M_2$ is not homomorphism.






By intuition, I think it is depend by the underling manifold’s topology. But I do not have a rigorous proof, I definitely have a non-rigorous one, if ignore the coverage…

As I find this problem when I try to give a proof of weyl law, I do not check the reference, may be this problem is a classical one? As always, I will appreciate to any interesting comments and answers, thanks a lots!A









We begin with our favorite situation, the Dirchlet problem on bounded simple-connected domain $\Omega$ in $\mathbb R^n$. Let $\lambda_1$ be the first eigenvalue of $$\Delta u=\lambda u \ in\ \Omega$$
$$u=0\ \ on\ \partial\Omega$$
Rescaling $u$ such that $\sup_{\Omega} u=1$, I think the following property of the first eigenvalue is true.
We have, the Minkowski functional of $\Omega$, called $M_{\Omega}$ and the Minkowski functional with the ball $B$ such that $vol(B)=vol(\Omega)$, then along the level set of $u$, i.e. the fiber: $$\Omega=\cup_{t\in [0,1]}l_t, l_t:=\{t|x\in \Omega, u(x)=t\}$$
We pretend for the isolate point $l_1$ to be a ball with radius 0, so equipped it with the uniformly density at every direction in $S^1$, i.e. the mass distribution given by $M_B$ and the total mass coincide with the total mass induce by $M_{\Omega}$ in $l_0$, i.e.
$$\int_{e\in S_1}M_{\Omega}(e)d\mu=\int_{e\in S_1}M_{B}(e)d\mu$$
The measure $d\mu$ equipped on $S^1$ is the natural Haar measure. And the cost function is given by $c(x,y)=\|x-y\|^2$. Then, among this setting,
I wish the following property to be true:
Along the direction $1\to 0$, the transport of density $\partial_{t_0} M_{\cup_{t=t_0}^1l_t}$ given the unique optimal transport of the natural measure induce by $M(\Omega)$ and $M(B)$.

**Remark 1** As point out by SebastianGoette, the multiplicity of the first eigenvalue must be one, thanks to the eigenfunction never change the symbol, so we are in the best case.

**Remark 2**:I am not very sure this property could always true, there may be a center example when $\Omega$ is not convex, but I tend to believe it is true at least when $\Omega$ is convex.

**Remark 3**: As point out by Dirk, when you try to consider the optimal transport problem, you always need to point out the cost function $c(x,y)$ defined on $\Omega \times \Omega$, for there, I think the naive choice is $c(x,y)=\|x-y\|^2$


The thing I can proof is the following, the level set of $u$ should be convex by brunn-minkowski inequality, and some type of monotonically property, i.e. more and more like a ball when the level set is more and more shirking smaller form $\partial \Omega$ to the point $f$ arrive maximum.

I will appreciate for any relevant comments and answer, thanks!



Uncertainty principle

The pdf version is Uncertainty principle. The nice note of terrence tao seems given a nice answer for the problem below.

1. Introduction

Is there a Brunn-Minkowski inequality approach to the phenomenon charged by uncertainty principle? More precisely, is it possible to say some thing about the Gaussian distribution

\displaystyle G(x)=e^{-|x|^2} \ \ \ \ \ (1)


to be the best choice that {\|\hat G-G\|_2} arrive minimum?

Remark 1 Or some other suitable distance space on reasonable function (may be some gromov hausdorff distance? Any way, to say the guassian distribution is the best function to defect the influence of uncertain principle.

I do not know the answer of the problem 1, but this is a phenomenon of a universal phylosphy, aid, uncertainty principle, heuristic:

It is not possible for both function {f} and its Foriour transform {\hat f} to be localized on small set.

Now let me give some approach by intuition to explain why the phenomenon of “uncertainty principle” could happen.

The approach is based on:

  1. level set decomposition.
  2. area formula (or coarea formula), anyway, some kind of change variable formula.
  3. integral by part.
  4. Basic understanding on exponential sum.

Let our function {f\in S} the Shwarz space, we begin with a intuition (not very rigorous) calculate:

\displaystyle \begin{array}{rcl} \hat f(\xi) & = & \int e^{2\pi i<\xi, x>}f(x)dx\\ & \overset{integral \ by \ part}= &\int \frac{1}{-2\pi i\xi}e^{-2\pi i<x,\xi>}\cdot \nabla f(x)\\ & \overset{Fubini}= & \int_{inf |f|}^{max |f|}\int_{level set A(t)} \frac{-e^{2\pi i<\xi,x>}}{-2\pi i\xi}\nabla f(x)dH^{n-1}(A)dt \end{array}

Now we try to understanding the result of the calculate, it is,

\displaystyle \hat f(\xi) =\int_{inf |f|}^{max |f|}\int_{level set A(t)} \frac{-e^{2\pi i<\xi,x>}}{-2\pi i\xi}\nabla f(x)dH^{n-1}(A)dt \ \ \ \ \ (2)


\displaystyle \pounds(A(t),\xi)=\int_{level set A(t)} \frac{-e^{2\pi i<\xi,x>}}{-2\pi i\xi}\nabla f(x)dH^{n-1}(A) \ \ \ \ \ (3)


The calculate is wrong, but not very far from the thing that is true, the key point is now the exponential sum involve. We could use the pole coordinate in the frequence space and get some very rough intuition of why the the uncertainty principle could occur.

Remark 2 Why we consider the level set decomposition, due to the integral is a combination of linear sum of the integral on every level set, so shape of level set is the key point.

The part of {\frac{-e^{2\pi i<\xi,x>}}{-2\pi i\xi}} in 2 is a rotation on the level set, a wave correlation of it and the christization function {\chi_{A_t}} of level set {A_t} in the whole space, this is of course a exponential sum.

Now we can begin the final intuition explain of the phenomenon of uncertainty principle. If the density of function {f} is very focus on some small part of the physics space, then it is the case for level sets of {f}, but we could say some thing for the exponential sum {\pounds(A(t),\xi)} 3 related to the level set, just by very simply argument with hardy litterwood circle method or Persaval identity? Any way, something similar to this argument will make sense, due to if the diameter of level set focus ois small, then we can not get a decay estimate for {\pounds(A(t),\xi)} when {\xi\rightarrow \infty} along one direction in frequency space, in fact we could say the inverse, i.e. it could not decay very fast.

2. Bernstein’s bound and Heisenberg uncertainty principle



2.1. Motivation and Bernstein’s bound

There is two different Bernstein’s bound, we discuss the first with the motivation, and proof the second rigorously. \paragraph{Form 1} {A} is a invertible affine map, then for a ball {B}, {A(B)=\epsilon} is a ellipsoid.

\displaystyle \epsilon=\{x\in {\mathbb R}^d|\sum_{j=1}^{d}r_j^{-2}(x_j-y_j)^2\leq 1\} \ \ \ \ \ (4)


By a orthogonal transform we could make {A} to be a diagonal matrix, i.e. {A=diag(r_1,...,r_d)}. It is said, for {\forall f\in S} or {f} is a smooth bump function, {f_A=f\circ A^{-1}}, so we have,

\displaystyle \hat f_A(\xi)=\int e^{2\pi i<x,\xi>}\cdot f\circ A^{-1}(x)dx \ \ \ \ \ (5)

We define dual of {\epsilon}, {\epsilon^*:=\{\xi\in {\mathbb R}^d| \sum_{j=1}^d\xi_j^2r_j^2\leq 1\}}.

Remark 3 Why there we use the metric {\xi_j^2r_j^2\leq 1} but not the standard inner product {<\xi,x>}? How to understand the choice?

Proposition 1 We have the following property:

  1. {f_A\in L^{\infty}\Longrightarrow \|\hat f_A\|_{1}\leq +\infty}.
  2. {|\hat f_A(\xi)|\leq c_N|\xi|(1+|\xi|^2_{\epsilon^*})^{-N}}


Remark 4

\displaystyle |\xi|^2_{\epsilon^*}=\sum_{i=1}^d\xi_j^2r_j^2

This is a norm of {{\mathbb R}^d} related to {\epsilon^*}.

Proof: Suffice to proof 2.

\displaystyle \begin{array}{rcl} |\hat f_A(\xi)| & = & |\int_{{\mathbb R}^d}e^{2\pi i<x,\xi>}f_A(x)dx|\\ & \overset{integral \ by \ part}\sim & \frac{1}{(1+|\xi|)^N}\int |e^{2\pi i<x,\xi>}\partial^N f_A(x)dx|\\ & = & c_N |\xi|(1+|\xi|_{\epsilon^*}^2)^{-N} \end{array}


More quantitative we have rigorous one: \paragraph{Form 2} If {f\in L^2({\mathbb R}^d)}, {supp f\in B(r,0)}, then it is not possible for {\hat f} to be concentrate on a scale much less than {R^{-1}}.

Proposition 2 (Bernstein’s bound) Suppose {f\in L^2({\mathbb R}^d)}, {supp f\subset B_R(0)}. Then,

\displaystyle \|\partial^{\alpha}\hat f\|_2\leq (2\pi r)^{|\alpha|}\|f\|_2, \forall \alpha. \ \ \ \ \ (6)

Proof: {\alpha=0} case is trivial by Paserval identity, which said on {L^2({\mathbb R}^d)}, fourier transform is a isometry, {\|f\|_2=\|\hat f\|_2}. For general case, integral by part, and use trivial estimate,

\displaystyle \begin{array}{rcl} \|\partial^{\alpha}\hat f\|_2 & \overset{integral \ by \ part}= & \|x^{\alpha} f\|_2\\ & \leq & (2\pi r)^{\alpha}\|f\|_2 \end{array}


2.2. Heisenberg inequality

Theorem 3 (Heisenberg uncertain principle) {f\in L^2({\mathbb R}^d)}, so {\hat f\in L^2({\mathbb R}^d)}, {\|f\|_2=\|\hat f\|_2}. then for any {x_0,\xi_0\in {\mathbb R}^d}, every direction, we have

\displaystyle \|f\|_2^2=\|f\|_2\|\hat f\|_2\leq \|(x-x_0)f\|_2\|(\xi-\xi_0)\hat f\|_2 \ \ \ \ \ (7)


Remark 5 We could understand the inequality by the following way. suffice to prove it with {f\in S} and then by approximation argument. {f\otimes \hat f\in S({\mathbb R}^d\times {\mathbb R}^d)}, define {\|f\otimes \hat f\|_2:= \|f\|_{L^2({\mathbb R}^d)}\cdot \|\hat f\|_{L^2({\mathbb R}^d)}}. then we have the following:

\displaystyle \|f\otimes \hat f\|_{L^2}\leq 4\pi \|xf\otimes \hat{xf}\|_{L^2} \ \ \ \ \ (8)

Remark 6 The inequality is shape, the extremizers being precisely given by the modulated Gaussians: arbitrary

\displaystyle f(x)= c e^{2\pi i\xi_0x}e^{-\pi \delta(x-x_0)^2} \ \ \ \ \ (9)


There are two proof strategies I have tried, I try them for several hour but not work out with a satisfied answer, the method more involve, I explain what happen in section 1, I have not tried, I will try it later. Both this two strategies i face some difficulties, I explain why I can not work out them with a proof: \paragraph{Strategy 1} The first one is, we could work with {f\in S} of course, by approximation, then we find, by Paserval, {\|f\|_2=\|\hat f\|_2, \|\partial_x f\|_2=\|\xi \hat f\|2} and are both true. then we use our favourite way to use Cauchy-Schwarz, the difficulty is we can not use a integral by part argument directly, even after restrict ourselves with monotonic radical symmetry inequality and by a rearrangement inequality argument, it seems reasonable due to rearrangement decreasing the kinetic energy as said in Lieb’s book. But even work with monotonic one, then one involve with some complicated form, try to use Fubini theorem to rechange the order of integral try to say something, it is possible to work out by this way but I do not know how to do. There is some calculate under this way,

\displaystyle \begin{array}{rcl} \|xf\|_2\|\xi \hat f\|_2 & \overset{Cauchy-Schwarz}\geq & \int xf\cdot \partial_x f\\ & \sim & \int f^2 \end{array}

but you know, at a point we have {\partial_x(Xf)=f+x\partial_x f\neq f}, the reasonable calculate is following,

\displaystyle \partial_x(xf)=f+x\partial_x f \ \ \ \ \ (10)

We want {\partial_x P(x,f) =f}, Then

\displaystyle \begin{array}{rcl} \partial_x P(x,f) & = & f\\ & = & \partial_x(xf)-x\partial_x f\\ & = & \partial_x(xf)-\partial_x(\frac{1}{2}x^2\partial_x f)+\frac{1}{2}x^2\partial_{x^2}f\\ & = & \partial_x(\frac{1}{6}x^3\partial_{x^2}f)-\frac{1}{6}x^3\partial_{x^3}f\\ ...\\ & = &\partial_x(\sum_{i=1}^{\infty}(-1)^{i+1}x^i\frac{1}{i!}\partial_{x^i}f)+(-1)^{i+1}x^i\frac{1}{i!}\partial_{x^{i+1}}f \end{array}

Seems to be {f=\partial_x(ln(f))}… I do not know.

\paragraph{Strategy 2} The second strategy is, in the quantity {\|xf\|_2\|\xi \hat f\|_x} we lose two cone very near {x_0,\xi}, we need use the extra thing to make up them. May be effective argument come from some geometric inequality.

3. The Amerein-Berthier theorem

Next we investigate following problem, the problem is following: if {E,F\subset {\mathbb R}^d} are of finite measure, can there be a nonzero {f\in L^2({\mathbb R}^d)} with {supp (f)\subset E} and {supp(\hat f)\subset F}? Some argument is folowing: Observe that:

\displaystyle \chi_{F}\hat f=\hat f \Longrightarrow \chi_{E}(\chi_F \hat f)^{\vee}=f. \ \ \ \ \ (11)

Assume that: {Tf:=\chi_{E}(\chi_F \hat f)^{\vee}} then {Tf=f}. So we have, at least {\|T\|_{2-2}\geq 1}. Some dirty calculate show:

\displaystyle \begin{array}{rcl} (Tf)(x) & = & \int e^{2\pi i\xi x}\chi_F\hat f(\xi)\chi_E(x)d\xi\\ & = & \int \int e^{2\pi i\xi(x-y)}f(y)\chi_F(\xi)\chi_E(x)dy d\xi\\ & \overset{Fubini}= &\int_{{\mathbb R}^d}\chi_E(x)\chi_F^{\vee}(x-y)f(y)fy \end{array}

So we can define kernel of {T},

\displaystyle K(x,y)=\chi_E(x)\chi_F(x-y)^{\vee} \ \ \ \ \ (12)


By Fubini, we calculate the Hilbert-Schmidt norm:

\displaystyle \int_{{\mathbb R}^{2d}}|K(x,y)|^2dxdy=|E||F|=\sigma^2<+\infty \ \ \ \ \ (13)


So {T} is a compact operator and its {L^2} operator norm satisfied {\|T\|=\min(\sigma,1)}. So if {\sigma<1} then we can canculate we can not have {f\neq 0} in the original question.

The story is in fact more interesting, the answer of the question is no even for {\sigma\geq 1}, so in all case. We have the following quatitative theorem:

Theorem 4 {E,F} finite measure in {{\mathbb R}^d}, then

\displaystyle \|f\|_{L^2({\mathbb R}^d)}\leq c(\|f\|_{L^2(E^c)}+\|\hat f\|_{L^2(F^c)}) \ \ \ \ \ (14)

for some constant {c=c(E,F,d)}.

Remark 7 There is a naive approach for this theorem: Area formula trick, the shape of level set. Obvioudly we have:

\displaystyle \|f\|_{L^2({\mathbb R}^d)}\leq \|f\|_{L^2(E)}+\|f\|_{L^2(E^c)} \ \ \ \ \ (15)


Key point is proof:

\displaystyle \|f\|_{L^2(E)}\leq c(E,F,d)\|\hat f\|_{L^2(F^c)} \ \ \ \ \ (16)


Let us do some useless further calculate:

\displaystyle \begin{array}{rcl} |f\|_{L^2(E)} & = & \|\chi_E \cdot f\|_{L^2({\mathbb R}^d)}\\ & = & \|\widehat {\chi_E\cdot f}\|_{L^2({\mathbb R}^d)}\\ & = & \|\hat \chi_E \cdot \hat{f^{\vee}}\|_{L^2({\mathbb R}^d)} \\ & = & \|\chi_E^{\vee} * f^{\vee}\|_{L^2({\mathbb R}^d)} \end{array}

So suffice to have:

\displaystyle \|\chi_E^{\vee} * f^{\vee}\|_{L^2({\mathbb R}^d)}\leq c(E,F,d)\|\hat f\|_{L^2(F^c)} \ \ \ \ \ (17)

But there is connter example given by modified scaling Gaussian distribution… The point is form 15 to 16 is too loose.

Following I given a right approach, following by my sprite on level set and area formula argument and discritization.

Proof: The story is the same for a discretization one. We need point out, change the space {{\mathbb R}^d} to {{\mathbb Z}^d}, then every thing become a discretization one, and the change could been argue as a approximation way. What happen then, we have a naive picture in mind which is:

\displaystyle \delta \rightarrow wave , \ wave \rightarrow \delta

What is the case with {L^2} norm, it become the standard nner product on {{\mathbb Z}^d}, and the scale involve, i.e. we have the following basic estimate:

\displaystyle \|\chi_E f\|_1^2\leq \|\chi_E f\|_2 \cdot|E| \ \ \ \ \ (18)


Now image if the density of {f} concentrate in a very small area, then by a cut off argument we consider the supp of {f}, {supp f=E} is very small, then use the argument 18, we could conclute the density of {\hat f} could not very concentrate in the fraquence space. The constant {c(E,F,d)} could be given presicely by this way, but I do not care about it. \Box


4. Logvinenko-Sereda theorem

Next we formulate some result that provide further evidence of the non-concentration property of functions with Fourier support on {B_1}.

4.1. A toy model

Theorem 5 Let {\alpha>1} an suppose that {S\subset {\mathbb R}^d} satisfies,

\displaystyle |S\cap B|<\alpha |B|, \ for \ all \ balls \ B \ of \ radius\ 1. \ \ \ \ \ (19)

If {f\in L^2({\mathbb R}^d)} satisfies {supp(\hat f)\subset B(0,1)} then

\displaystyle \|f\|_{L^2(S)}\leq \delta(\alpha)\|f\|_2 \ \ \ \ \ (20)

Where {\delta(\alpha)\rightarrow 0} as {\alpha \rightarrow 0}.

Proof: This is a easy corollary of the argument I give in the proof of Amerein-Berthier theorem 4. \Box

4.2. A refine version

Theorem 6 Suppose that a measurable set {E\subset {\mathbb R}^d} satisfies the following “thinkness” condition: there exists {\gamma\in (0,1)} such that

\displaystyle |E\cap B|>\gamma |B| \ for \ all \ balls \ B \ of \ radius\ R^{-1}. \ \ \ \ \ (21)

where {R>0} is arbitrary but fixed. Assume that {supp(\hat f)\subset B(0,R)}. Then

\displaystyle \|f\|_{L^2({\mathbb R}^d)}\leq C\|f\|_{L^2(E)}. \ \ \ \ \ (22)

where the constant {C} depends only on {d} and {\gamma}.

Remark 8 This proof need some very good estimate come from several complex variables.

5. The Malgrange-Ehrenpreis theorem

Theorem 7 Let {\Omega} be a bounded domain in {{\mathbb R}^d} and let {p\neq 0} be a polynomial, Then, for all {g\in L^3(\Omega)}, there exists {f\in L^2(\Omega)} such that {p(D)f=g} in a distribution sence.


Brunn-Minkowski inequality

In this short note, I posed a conjecture on Brunn-Minkwoski inequality and explain why we could be interested in this inequality, what is it meaning for further developing of some fully nonlinear elliptic equation come from geometry. The main part of the note devoted to discuss several different proof of classical Brunn-Minkowski inequality.

Brunn-Minkowski inequality

1. Introduction

I believe, every type of Brunn-Minkowski inequality, type of Brunn-Minkowski inequality is in some special sense and will be explained later, will be crucial with a corresponding regularity result of a fully nonlinear elliptic equation which could be realizable by geometric way which will also explained in further note.

So the key point is that Brunn-Minkowski inequality is crucial and have potential application, I posed a problem there and then consider the classical Brunn-Minkowski inequality, we give several proof of the classical Brunn-Minkowski inequality, everyone could help us to have a more refine understanding of the original difficulty with different angle.

Theorem 1 (conjecture) We have a map

\displaystyle f:{\mathbb R}^d\times {\mathbb R}^d \longrightarrow {\mathbb R}^d \ \ \ \ \ (1)


\displaystyle (x,y) \longrightarrow f(x,y)

We are willing to called the function {f} as the hamiltonian function. then we could consider the hamiltonian flow of the function {f}, but this could only true for a even dimension manifold to make there exists {f} that {df} is a non-degenerate closed {2-}form.

Anyway we consider the level set of {f}, we get a foliation i.e {{\mathbb R}^d=\{\amalg_{p\in {\mathbb R}}f^{-1}(p)\}}. we consider the gradient flow with {f}, called the gradient flow begin with {q\in {\mathbb R}} as {\{\phi_q(t)\}_{t\in {\mathbb R}}}. And we wish the gradient flow have a addition structure on itself then we could consider what is the Brunn-Minkowski inequality in this setting, the condition is a group structure on the space of level set {L_f=\{\amalg_{p\in {\mathbb R}}f^{-1}(p)\}}, i.e.

\displaystyle \forall t_1,t_2\in {\mathbb R}, \forall q\in R^d, \phi_{q}(t_1+t_2)=\phi_{q}(t_1)\circ \phi_{q}(t_2) \ \ \ \ \ (2)

Remark 1 take {f(x,y)=x+y} in 1, this conjecture reduce to the toy model, i.e. classical Brunn-Minkowski inequality.

Remark 2 We could generate the problem to the problem which is charged by several energy function {f_1,...,f_k}, if the induced gradient flow is amenable, then this is somewhat similar with the one dimension case, I wish if we could do something for the single function {f_1}, then we can say something for the several functions involved case.

Remark 3 This could also generate to amenable group action case and quantization of it.

Meaning, the cohomology induce by a hamiltonian system on some special foliation on fiber of geometric bundle. This type of result could help to establish the vanish of the cohomology, the get the existence theory and regularity result for corresponding elliptic nonlinear differential equation. And solve the original problem I consider.

Now we given the statement of Brunn-Minkowski inequality.

Theorem 2 (brunn minkowski inequality) For {A,B} measurale set in {{\mathbb R}^d}. we have following,

\displaystyle \mu^{\frac{1}{d}}(A+B)\geq \mu^{\frac{1}{d}}\mu(A)+\mu^{\frac{1}{d}}\mu(B) \ \ \ \ \ (3)



The general spproach of Brunn-Minkwoski inequality is following,

  1. divide the measurable set {A,B} into small cubes.
  2. Shinking trick, transform the set into convex one.

for the first one, we have the following lemma,

Lemma 3 {\forall 0<\lambda<1}, {\exists \epsilon>0} {A_{\epsilon}} measurable set, {A_{\epsilon}\subset A}, {\mu(A-A_{\epsilon})<\epsilon}, and {A=(A-A_{\epsilon})\amalg \cup_{i\in I}(c_i\cap A_{\epsilon} ) }, and

\displaystyle \frac{\mu(c_i\cap A_{\epsilon})}{\mu(c_i)}>\lambda, \forall i\in I \ \ \ \ \ (4)

Proof: The proof of the lemma is a easy corollary of the construction of Lesbegue(or Borel) measurable {\sigma} algebra. \Box

Remark 4 The existence of the property given in the lemma is not the key point, the key point is {d(A_{\epsilon},A)\rightarrow 0, as\ \ \epsilon\rightarrow 0}.

Has this two simplify in hand, we could give several approach to proof the inequality and these proof carry information more than just a proof, they carry some information with the structure of space {(\{ 0,1 \}_{{\mathbb R}^d},\mu^{\frac{1}{d}},+)}. \newpage

2. A proof with discretization

There is a lots of ways to attack the Brunn-Minkowski inequality, the most natural one is discretization. But unfortunately there is some technique obstacle for proof or even state the discretization version of “Brunn-Minkowski” inequality.

The “boundary” and “area” should not compatible.

\displaystyle \mu_{d-1}(\partial E)<<\mu_{d}(\mu(E)) \ \ \ \ \ (5)

And we need use the fact,

\displaystyle A_{\epsilon}\overset{G-H \ sence}\longrightarrow A \ \ \ \ \ (6)

Now we just state what we expect it should transform in, because we have a fully understanding with the discretization model, there is a result named Cauchy-Daveport inequality.

Theorem 4 (cauchy-daveport inequality) There are two case, one in {{\mathbb Z}}, one in finite field {{\mathbb Z}_p}.

  1. {{\mathbb Z}} case, {\forall A,B\subset {\mathbb Z}} are finite set,we have,

    \displaystyle |A+B|\geq |A|+|B|-1. \ \ \ \ \ (7)

  2. {{\mathbb Z}_p} case, {\forall A,B\subset {\mathbb Z}} are finite set,we have,

    \displaystyle |A+B|\geq \max\{|A|+|B|-1.p\}. \ \ \ \ \ (8)


Proof: for the {{\mathbb Z}} case, the story is more or less trivial, just do to a observation, if {A=\{a_i,a_1<a_2<...<a_n\},B=\{b_i,b_1<b_2<...<b_m\}}, then

\displaystyle a_1+b_1<\min\{a_1+b_2+a_2+b_1\}<....<a_n+a_m

There exists a strictly increasing chain of length at least {n+m-1}.

For the {{\mathbb Z}_p} case, following is a graph to explain what happen, basically we define a operation on tuples, i.e. {T:(A,B)\rightarrow (T(A),T(B))}, and make the additive energy {E_{A,B}:=|A+B|} decreasing. after induction with this transform and the transform from a tuple to the minimum additive energy by translation, the additive energy decreasing and decreasing then arrive the global minimum. But it is easy to conclude in this case one of {A,B} become null set and then the inequality 8 follows. \Box

But when we discrete the Brunn-Minkowski inequality, we expect a high dimension generation of the inequality 4. Naively we wish,

Theorem 5 (naive generation of cauchy-daveport inequality) For {d\in {\mathbb N}}, and {\forall A,B\subset {\mathbb Z}_d} are finite sets,

\displaystyle |A+B|^{\frac{1}{d}}\geq |A|^{\frac{1}{d}}+|B|^{\frac{1}{d}} \ \ \ \ \ (9)

But this is not the case, there is a counterexample for 5. We could construct some {A,B} such that {|A+B|\sim |A|+|B|}, consider they be very thin line.

So why we are in this worse situation? because we lose the information of {A_{\epsilon}\overset{G-H}\longrightarrow A }, {B_{\epsilon}\overset{G-H}\longrightarrow B }. So they have the trend tending to make the “boundary” campatible with “area”. Two thin line in the same direction is exactly the worst case, which is just a equal condition of 1-dimension case.

One natural way to except the situation is to bounded the “isperimetric constant”, to assume {A,B} varies in a subset of measurable set, with addition condition that {\frac{\mu_{d-1}(\partial A)}{\mu_d(A)}} is bounded by some constant. But this is also not the suitable set for our inequality, I explain how to capture the information of the G-H coverage.

Now assume {A,B} are convex bounded set, and we take a global orthogonal basis in {{\mathbb R}^d}. named {(e_1,...,e_d)}. We give the definition of {\epsilon-}discretization of {A}, named {A_{\epsilon}}.

Definition 6 ({\epsilon-} discretization) The construction of {A_{\epsilon}} from {A} is following:

  1. divide {A} into {\amalg_{i\in I}c_i\cap A}, {c_i} is the {\epsilon} cubes.
  2. use {c_i} or {\emptyset} instead of {c_i\cap A} depending on iff {\frac{\mu(A\cap c_i)}{\mu(c_i)}>\lambda}, where {\lambda<1} is a given number only rely on {\epsilon}. i.e.

    \displaystyle A\cap c_i\rightarrow D_{\epsilon} (A\cap c_i) \ \ \ \ \ (10)

  3. glue them, define {A_{\epsilon}:=\amalg_{i\in I}D_{\epsilon}(c_i\cap A)}.


Now we describe the condition of {A_{\epsilon}\overset{G-H}\rightarrow A} rigorous meaning campatible with {\epsilon-}discretization.

Under the basis, there is a coordinate we could know iff {c_i=c(\lambda_1,...,\lambda_d)} is the cube center at {(\epsilon\lambda_1,...,\epsilon\lambda_d )} if it is in {A_{\epsilon}}. Due to {A} is convex, {\partial A} is lipchitz. So you will have some locolization property, said, at every fix discretization scale {\epsilon}, the position of {(\lambda_1\epsilon ,...,\lambda_d\epsilon)} is morally known so the number of cubes in {A_{\epsilon}} in the one dimensional affine space {\Omega^{\epsilon}_{a_1,...,a_{k-1},a_{k+1},...,a_n}} which is the subspace of {{\mathbb Z}^d} the number {\Omega^{\epsilon}_{a_1,...,a_{k-1},a_{k+1},...,a_n}} is asymptopic to the {1-}dimensional hausdorff measure of {A\cap \Omega_{a_1,...,a_{k-1},a_{k+1},...,a_n}}. So at least,

\displaystyle |\Omega^{\epsilon}_{a_1,...,a_{k-1},a_{k+1},...,a_n}|=O((\frac{1}{\epsilon})) \ \ \ \ \ (11)


Property 11 is crucial, which mean {A_{\epsilon}} is really a n-dimensional space and automatically we have the bounded on isoperimetric constant {\frac{\mu_{d-1}(\partial A)}{\mu_d(A)}}.

Now we can look at every {A_{\epsilon}} and take limit {\epsilon\rightarrow \infty}. In fact we a in the situation with Accumulation of wood to make the product have smallest volume. Not to optimized the tuples {(A,B)} but fix one of it, said {B}, optimized the other one, said {A}. This is the key point of proof, a little bit different from the argument of {1-}dimensional 4 where we optimized the tuple.

Key point:

  1. we can ignore “small core”.
  2. This inequality is said, due to {A+B=(\frac{A+B}{2})+\frac{A+B}{2}}, the convex of the functional {\mu^{\frac{1}{d}}} on convex set.

The way of discretization could not handle the problem but definitely said that the difficulty occur with the shape of boundaries {\partial A,\partial B}.

3. A proof with “central of mass” and Minkowski functional

Definition 7 (Central of mass) The central of mass {p} of measurable set {A}, if exist, satisfied, {\forall e\in S^d}, there is a subspace {L_e} with codimension 1 divide {A} into two connected part {A_{1,L_e},A_{2,L_e}} such that

\displaystyle \mu(A_{1,L_e})=\mu(A_{2,L_e}) \ \ \ \ \ (12)

then {p\in l_e}.

Remark 5 For a measurable set {A} , if central of mass {p} exists, then there exist only one. This is a easy observation do to the definition of {p}, i.e. the intersection of suitable affine subspace in every direction.

Theorem 8 (existence of central of mass)

Proof: It is easy to attain {p} by take {n+1} different directions in {S^d}, then easy to proof every line {L_e} across it be definition of {L_e}. \Box

Definition 9 (Minkowski functional) for a measurable set {X\subset {\mathbb R}^d} and a point {p}, define {M_{X,p}} on {S^{d-1}}, such that

\displaystyle M_X(e)=\sup_{\lambda>0,\lambda e+p\in X}\lambda \ \ \ \ \ (13)



Remark 6 If {A} is convex, then {M_A} is a convex function on {S}, so it is lipchitz.

we have following formula for the measure of {A}.

Theorem 10

\displaystyle \mu(A)=\frac{1}{\mu(S^{d-1})}\int_{S^{d-1}}M_A(e)de \ \ \ \ \ (14)

Proof: trivial. \Box

Now the task reduce fixing {B} and {\mu(A)} to optimized {A} make {\mu(A+B)} small. It is the same as make {\frac{1}{\mu(S^{d-1})}\int_{S^{d-1}}M_{A+B}(e)de} small when fix {\frac{1}{\mu(S^{d-1})}\int_{S^{d-1}}M_A(e)de} and {M_B}. Due to

\displaystyle M_{A+B}(x)=\sup_{x_1,x_2\in S^{d-1}}M_A(x_1)+M_B(x_2) \ \ \ \ \ (15)

This lead to the whole story, given a proof of 3.

4. A proof with multi-scale analysis

This approach is a nonstandard one, due to I believe the renormlization or continue fractional or multilinear estimate is everywhere. We first play with a toy model, the rectangle.

Theorem 11 Brunn-Minkowski inequality is right for {A,B} are rectangles.


\displaystyle (\Pi(a_1+b_i) )^{\frac{1}{d}}\geq (\Pi(a_1) )^{\frac{1}{d}}+(\Pi(a_1+b_i) )^{\frac{1}{d}}

\displaystyle \Leftrightarrow 1\geq [\frac{\Pi a_i}{\Pi(a_i+b_i)}]^{\frac{1}{d}}+[\frac{\Pi a_i}{\Pi(a_i+b_i)}]^{\frac{1}{d}} \ \ \ \ \ (16)


In fact,

$latex \displaystyle \begin{array}{rcl} 16 RHS & \overset{A-G}\leq &\frac{1}{d}\sum_i\frac{a_i}{a_i+b_i}+ \frac{1}{d}\sum_i\frac{b_i}{a_i+b_i}\\ & = & 1. \end{array} &fg=000000$


The story is following,

5. connection of Brunn-Minkowski inequality and Sobolev inequality, the firth proof

We begin with a calculate based on intuition and it is not rigorous.

\displaystyle \begin{array}{rcl} \mu(A+B)^{\frac{1}{d}} & = & [\int_{{\mathbb R}}(\int_{{\mathbb R}^{d-1}}\chi_{A+B}(\xi_1,...,\xi_{n-1})d\xi_1...d\xi_{n-1} )d\xi_n]^{\frac{1}{d}}\\ & \sim & \frac{1}{\mu(A_n)}\int_{{\mathbb R}}(\int_{{\mathbb R}^{d-1}}\chi_{A+B}(\xi_1,...,\xi_{n-1})d\xi_1...d\xi_{n-1})^{\frac{1}{d-1}}d\xi_n\\ &\overset{induction \ on \ d} \geq & \frac{1}{\mu(A_n)}(\mu^{\frac{1}{d-1}}(A(\xi_n))+\mu^{\frac{1}{d-1}}(B(\xi_n)))d\xi \\ &\overset{A-G \ inequality} \geq & \mu^{\frac{1}{d}}(A)+\mu^{\frac{1}{d}}(B). \end{array}

The second line is due to I believe there {\exists \ A,B} such that it is a equality, by the equal condition of Minkowski inequality, in fact this is morally inverse of Minkowski inequality. The second reason in general case why the second inequality is true is due to a rescaling argument, change {(\xi_1,...,\xi_n)\rightarrow (\lambda\xi_1,...,\lambda\xi_n ), \forall \lambda\in {\mathbb R}}, by the rescaling argument we conclude if there is a such inequality, the index of it must be the case.