Multiplication function on short interval

The most important beakgrouth of analytic number theory is the new understanding of multiplication function on share interval, this result is established by Kaisa Matomäki & Maksym Radziwill. Two very young and intelligent superstars.

The main theorem in them article is :

As soon as H\to \infty when x\to \infty, one has:

                    \sum_{x\leq n\leq x+H}\lambda(n)= o(H)

for almost all x\sim X .


In my understanding of the result, the main strategy is:

Step 1:Parseval indetity, monotonically inequality

Parseval indetity, monotonically inequality, this is something about the L^2 norms of the quality we wish to charge. It is just trying to understanding

\frac{1}{X}\int_{X}^{2X}|\frac{1}{H}\sum_{x\leq n\leq x+H}\lambda(n)|^2dx

as a fuzzy thing by a more chargeable quality:

  \frac{1}{X^2}\int_{0}^{\infty}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt

In fact we do a cutoff, the quality we really consider is just:

\frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt

established the monotonically inequality:

\frac{1}{X}\int_{X}^{2X}|\frac{1}{H}\sum_{x\leq n\leq x+H}\lambda(n)|^2dx << \frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt

In my understanding, This is a perspective of the quality, due to the quality is a multiplicative function integral on a domain (\mathbb N^*) with additive structure, it could be looked as a lots of wave with the periodic given by primes, so we could do a orthogonal decomposition in the fractional space, try to prove the cutoff is a error term and we get such a monotonically inequality.

But at once we get the monotonically inequality, we could look it as a compactification process and this process still carry most of the information so lead to the inequality.

It seems something similar occur in the attack of the moments estimate of zeta function by the second author. And it is also could be looked as something similar to the  spectral decomposition with some basis come from multiplication unclear, i.e. primes.


Step 2: Involved by multiplication property, spectral decomposition 

I called it is “spectral decomposition”, but this is not very exact. Anyway, the thing I want to say is that for multiplication function \lambda(n), we have Euler-product formula:

Euler-product formula:
                      \Pi_{p,prime}(\frac{1}{1-\frac{\lambda(p)}{p^s}})=\sum_{n=1}^{\infty} \frac{\lambda(n)}{n^s}

But anyway, we do not use the whole power of multiplication just use it on primes, i.e. \lambda(pn)=\lambda(p)\lambda(n) leads to following result:

\lambda(n)=\sum_{n=pm,p\in I}\frac{\lambda(p)\lambda(m)}{\# \{p|n, p\in I\}+1}+\lambda(n)1_{p|n;p\notin I}

This is a identity about the function \lambda(n), the point is it is not just use the multiplication at a point,i.e. \lambda(mn)=\lambda(m)\lambda(n), but take average at a area which is natural generated and compatible with multiplication, this identity carry a lot of information of the multiplicative property. Which is crucial to get a good estimate for the quality we consider about.


Step 3:from linear to multilinear , Cauchy schwarz

Now, we do not use one sets I, but use several sets I_1,...,I_n which is carefully chosen. And we do not consider [X,2X] with linear structure anymore , instead reconsider the decomposition:

[X,2X]=\amalg_{i=1}^n (I_i\times J_i) \amalg U

On every I_i\times J_i it equipped with a bilinear structure. And U is a very small set, $|U|=o(X)$ which is in fact have much better estimate.

\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt =\sum_{i=1}^n\int_{I_i\times J_i}  \frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt +\int_N |\sum_{n\leq X}\lambda(n)n^{it}|^2dt

Now we just use a Cauchy-Schwarz:

\sum_{i=1}^n\int_{I_i\times J_i}  \frac{1}{X^2}\int_{|log(X)|^{100}}^{\frac{X}{H}}|\sum_{n\leq X}\lambda(n)n^{it}|^2dt +\int_N |\sum_{n\leq X}\lambda(n)n^{it}|^2dt$


Step 4: major term estimate


step 5:minor term estimate


step 6: estimate the contribution of area which is not filled



transverse intersections

This problem may be a embarrassed one, but I even could not prove it for the 1 dimensional case.

Here is the problem:

>**Question 1** M is a compact n-dimensional smooth manifold in R^{n+1}, take a point $p\notin M$. prove there is always a line l_p pass p and l_p\cap M\neq \emptyset, and l_p intersect transversally with M.

You can naturally generated it to:
>**Qusetion 2** M is a compact $n$-dimensional smooth manifold in R^{n+m}, take a point p\notin M. Prove $\forall 1\leq k\leq m$, there is always a hyperplane P_p, dim(P_p)=k pass $p$ and P_p\cap M\neq \emptyset, and P_p intersect transversally with M.

Thanking for Piotr pointed out, assuming “transverse” means “the tangent spaces intersect only at 0”.

We focus on question 1 for simplified.

Even in 1 dimension it is not easy at least for me, **warning**: a line l pass p may be intersect $M$ at several points combine a set A_l, A_l could be finite, countable or even it is not countable (consider M is induced by a smooth function for which the zeros set is Cantor set.)… And if there is one point a\in A_l, $l$ is tangent with the tangent line of M at a, then l is not intersect transversally with M.

**My attempt**:
I could use a dimensional argument and Sard’s theorem to establish a similar result but instead of a fix point p, we proof for generic point in R^{n+1} which is not in M we can choose such a line.

So it seems reasonable to develop the dimensional technique to attach the question 1, in 1 dimensional, it will relate to investigate the ordinary differential equation:


Where p=(a,b), M have a parameterization M=\{x,f(x)\}. If there is a counterexample for the question 1, then there is another solution which satisfied the ODE in the sense:

at least for every line l there is a intersection point a_l\in l\cap M, f satisfied ODE at a_l.

This is just like the uniqueness of the solution of such a ODE is destroyed at some subspace of a line which have some special linear structure, I do not know if this point of view with be helpful.

I will appreciate for any useful answers and comments.

Proof 1(provided by fedja)

Area trick.(weakness:it seems we could not proof the transtivasally intersection point have positive measure by this way).

  Proof 2(provided by Piotr)

#For the codimension 1 case.#

###Using Thom transversality theorem.###
Consider the maps f_s:\mathbb{R} \to \mathbb{R}^n parametrized by s \in S^{n-1} and given by f_s(t) = p + t \cdot s. The map F(s,t) = f_s(t), F:S^{n-1} \times \mathbb{R} \to \mathbb{R}^n is clearly transverse to M, thus Thom’s transversality says that f_s is transverse to M for almost all s. Now it suffices to prove that for an open set in S^{n-1}, the line given by f_s intersects M. Proven below.

###Using Sard’s theorem directly.###
Thom’s transversality is usually proven using Sard’s theorem. Here is the idea.

Consider the projection \Pi:\mathbb{R}^n \setminus \{p\}\to S^{n-1}_p onto a sphere centered at p. A line l_p through p intersects M transversally if the two points l_p \cap S^{n-1}_p are regular values of \Pi (indeed, the critical points of \Pi are exactly the points x \in M at which the normal \vec n_x is perpendicular to the radial direction (with respect to $p$)). By Sard’s theorem, the set of regular values is dense in S^{n-1}_p.

We need to choose any point s on the sphere for which both s and -s are regular values, and the line f_s through p and s actually intersects M. It suffices to prove that the set of points s for which this line intersects M contains an open set. We could now use the Jordan-Brouwer Separation Theorem and we would be done, but we can do it more directly (and in a way that seems to generalize).

###The set of points s \in S for which f_s intersects s has nonempty interior.###
For each point q \notin M the projection \Pi:M \to S_{q,\varepsilon_q}^{n-1} onto the sphere centered at q, of radius \varepsilon_q small enough so that the sphere does not intersect M, has some (topological) degree d_q. It is easy to check that if one takes any point x \in M and considers the points x \pm \delta \vec n_x for small \delta, the degrees of the corresponding maps differ by 1. It follows that we can find a point q for which d_q \neq d_p, which guarantees that for every point q' in a small open ball B around q (all these points have same degree d_q), the line joining p and q' intersects M. Projection of B on S_p^{n-1} is an open set which we sought.

#For the general case (partial solution).

I think a similar reasoning should work, however, notice that for k < m we cannot make P_p intersect transversally with M because of dimensional reasons: the dimensions of M and P_p don’t add up to at least n+m. Recall that transversality implies Thus, either (1) you want to consider k \geq m, or (2) define “transversal intersection” for such manifolds saying that the tangent spaces have to intersect at an empty set.

Also, for k>n we can just take any plane P_p which works for k=m and just extend it to a k-dimensional plane.

###Assuming k = m.###
A similar reasoning should work for f_s:\mathbb{R}^m \to \mathbb{R}^{n+m} with s = (s_1, \ldots, s_m) going over all families of pairwise perpendicular unit vectors, and f_s(t_1,\ldots,t_m) = p+\sum_{j=1}^m t_i \cdot s_i. Thom’s transversality says that for almost all choices of s, the plane f_s is transverse to M.

### The nonempty interior issue. ###
The only thing left is to prove that the set of s for which the intersection is nonempty has nonempty interior. Last time we proved that there is a zero-dimensional sphere containing p, namely \{p,q\}, which has nonzero linking number with M, and by deforming if to spheres \{p,q'\} and taking lines through pairs p,q', we got an open set of parameters for which the line intersects M.

Here should be able to do a similar trick by finding a m-1-dimensional sphere with nonzero linking number with M. The ball that bounds that sphere has to intersect M, thus the plane P containing the sphere has to intersect M. By perturbing the sphere we get spheres with the same linking numbers, and get all the planes that lie in a neighbourhood of P; in particular, we get an open set of parameters s for which f_s intersects M.

Well, we don’t actually need a *round* sphere, but we do need a *smooth* sphere that lies in a m-dimensional plane. There’s some trickery needed to do this, but I am sure something like this can be done.

Maybe somebody else can do it better?

### For k<m ###

I don’t really know how to attack this case, assuming “transverse” means “the tangent spaces intersect only at 0“.


An approach to Vinogradov estimate

Vinogradov estimate is:

|\sum_{n=1}^{N}e^{2\pi i\alpha P(n)}|\leq c_A\frac{N}{log^A N}

For fix \alpha is irrational and \forall A>0 ... (*).

Assume deg(P)=n, this could view as a effective uniformly distribute result of dynamic system:  ([0,1]^n,T), where T: x\to (A+B)x, b is a nilpotent matrix, matrix A is identity but with a irrational number \alpha in the (n, n) elements.

First approach

we could easily to get a “uniform distribute on fiber” result without very much tough estimate to attach the theorem. That is just a application by my  “rigid trick” that is describe in my early note. But this approach is according to the understanding of the result as a uniformly distribute result on Torus T^n, we could do this approach with the last S^1, which will corresponding to \partial^{n-1}x_k, i.e. we could apply the “rigid trick” to prove sequences (x_k,\partial^1 x_k,..., \partial^{n-1} x_k) is uniformly distribute according to \partial^{n-1} x_k\in S^1 .


But this approach seems difficult to generate. The difficulty is come from both there is no  similar uniformly distribute of the other perimeter use the rigid trick (At least as I know, I try to prove there could be one but I failed) and if in the best case we have the similar uniformly distribute result for other perimeter there is still some thing more need to be established. See this graph for a counterexample that the uniformly distribute for all fiberation could not derive a uniformly distribute for the original space.


Second approach 

In this approach we need use the information of continue fractional to get some information (Which is of course critical to get some information about the estimate). But I do not know if it is necessary, maybe this could be a interesting question weather the information come from continue fractional must involve to get such a estimate in the future, but not today.

Any way, there is two different type of continue fractional:



Anyway, these could be understand as a same thing more or less (if fact we can calculate some quantitive with a_i,q_i which is roughly the same). That is just the orbits \{e^{2\pi i\alpha}\} have quasi-period property, that is to say, under certain norms, it could be understand as the limits of periodic sequences. So it is natural to approximation \{e^{2\pi i\alpha}\} by periodic sequences and will lead to a very good point-wise coverage result:

T_k^{n}(x) \longrightarrow T^{n}(x)

Where T_k^{n}(x)=e^{2\pi i\sum_{i=1}^k\frac{1}{p_1...p_i}} is just the periodic approximation sequence which come from the best approximation (critical point of ||\frac{q}{p}-\alpha||), which natural occur in continue fractional. And by this we already arrive a non qualitative form result of (*) with deg(P)=1.

But unfortunately this approximation is too good to be true for deg(P)\geq 2 case. The reason of this result could be true is just because the natural estimate for the best approximation of \alpha; i.e. Dirichlet approximation theorem.

But for higher degree case, although we could not expect this thing to be true, we still could image a weaker but enough result to be true:

\{T_k^{n}(x)\} \longrightarrow \{T^{n}(x)\}

in the Gromov Hausdorff metric sense, and the $T_k^n(x)$ is carefully chose, which have a finite torsion structure(which could be view as a multilinear structure which will play a central role in the estimate). Here is a graph for deg(P)=2:


Roughly speaking,  in general deg(P)=n case, there is a cube structure in the orbits e^{2\pi iP(n\alpha)} and is critical to observe that the progression of difference structure in it. The goal of this approach is to establish some result from the finite torsion structure(multilinear structure). That is to say, the boundary is high order thing in all direction but there is only one direction attend to infinity the other is just a finite torsion, and we wish to get more information from the extra structure.

This also have a physics explaining, for which see the graph:


Third approach

For P(n)=an^2+bn+c case:


A discret to continuous approach to the Dirichlet principle.

Direchlet principle:
\Omega \subset R^n is a compact set with $C^1$ boundary. then there exists unique solution $f$ satisfied $\Delta f=0$ in $\Omega$, $f=g$ on \partial \Omega.

Perron lifting and barrier function

We know the standard approach of the Dirichlet principle is perron lifting and construction of barrier function on the boundary.

The key point is if we define the variation energy E(u)=\int_{\Omega}|\nabla u|^2, then it is easy to see for u_1,u_2 is in perron set, E(sup (u_1,u_2))\geq \max\{E(u_1),E(u_2)\}. So we can begin from a maximization sequence to construct a Cauchy sequence by perron lifting and by the involve of barrier function to make the solution compatible with the boundary condition then arrive a proof.

But when I was a freshman in undergraduate school and I do not know the method of perron lifting I try something I name it from discret to continuous approach to try to solved the problem. It is always a puzzle in my mind iff we can solve the Dirichlet principle in this way, roughly speaking, it is divid into two part:

1. Investigate the discretization of harmonic function in smaller and smaller scale. The discretization I consider is just \Omega\cap \epsilon \mathbb Z^2 i.e. the \epsilon-latties in $\Omega$,and discretization Laplace operator \Delta_{\epsilon}u(x_1,...,x_n)=\sum_{i_1,...,i_n\in\{-1,1\}}\frac{u(x_1+i_1,...,x_n+i_n)}{2^n}. Some result is much easier to arrive with the discretization thing, you know ,such as the existence of solution is just come from simple linear algebra. and we can deduce harneck inequality, gradient estimate, even green function. So we get a solution \hat f_{\epsilon} of \epsilon discretization and we do a extension \Omega\cap \epsilon \mathbb Z^2 to \Omega by take value of a small tube by the center of the tube, where the value have a definition by \hat f_{\epsilon}, and now we get f_{\epsilon}.


2. The second step is to proof the solution f_{\epsilon} with \epsilon-discretization problem will coverage to the solution of original problem;i.e. we want to proof a L^{\infty} estimate;i.e. \forall \delta>0, \exists \epsilon>0, \forall 0<\epsilon_1,\epsilon_2<\epsilon we have \forall x\in \Omega, |f_{\epsilon_1}(x)-f_{\epsilon_2}(x)|<\delta. and by Albano-Ascoli theorem to construct f. Then we need to proof $f$ is the harmonic function we find, to verify this information we use the mean-value property. So we need to prove f satisfied mean-value property for every ball in \Omega.

Here is my first question,
> **Question 1:** How to prove the L^{\infty} estimate and the MVP of \epsilon-discretization will coverage to the MVP in R^n case occor in second step?

My attempt to the L^{\infty} estimate is by renomelazation which seems could work, but the annoying thing is to proof the mean-value property will coverage to the real one, I try to use some result of random walk, but it seem not works…

My second question is:
> **Question 2:** Are this approach a universal phenomenon? At least could we use this approach to establish the existence of solution for linear elliptic and parabolic equation?

The Third question is:
> **Question 3:** If we consider some inverse problem, that is to say, form a MVP instead of a PDE to derive a solution, could this always be possible? some example is, if we change the mean value property for harmonic function from the average of ball to cube or triangle or elliptic or something else, what happen? Is there always a solution satisfied the news MVP point-wise? If not, Is there some counterexample? on another hand, if yes, are them came from some PDE?

Weyl law

In 1911 year, when Weyl is a young mathematician specticlizing in integrable system and PDE, He proved the important result about the asystomztion of eigenvalues of Dirichelet problem in \Omega\subset R^n is a compact domain;i.e.

N(\lambda)=(2\pi)^d Vol(\Omega)\lambda^{\frac{d}{2}}(1+o(1))

Which in fact is a conjecture of *** in *** in published in 1910.

This is definitely a very amazing achievement of mathematician, The realist meaning we can actually charge with the spectrum asyspesion.

In fact, we know, the only thing we know is that the eigenfunction with different spectrum is orthogonal and we have are a cretition named maximum-minmum principle for the k eigenvalue. but how could we charge with the asymotum of them? It seems not to be chargeable, though we have a L^2 isometry, the spectrum expansion, ut it still not seems to be chargeable the main difficult come from the compacness this just mean a divide of the whole space, and we consider the X-ray tansigation from every point to the whole space, it need to be passion kernel, or we change it to be a pare matrix.


Yes, We can just look this phenomenon as there are two different world, one is the real world in the Ecliud space, there other is the a wave function world, in the second world it is composted by the unique of the solution u for \Delta u=\lambda u for some eigenvalue \lambda and all units in this world is the translation and rescaling of u. Then thing become interesting, now how to understand the other guy, i.e. the other eigenvalue and eigenfunctions? They must be the u after some translation combine with rescaling and trslation and rotation!!! so there is a dynamic system action on it! and if we only let it to be affine map,i.e. combine only taslation and rotation, then we just get the all eigenfunctions with the same eigenvalue.

Now let us see what is it, it is just need to be compatible with the boundary condition, so it need to be moduli space that the boundary map is a measure that is arrive able by only affine translation of the function (we look it as a obsevalbel) So it is a restriction from a high dimensional space to the boundary of it.  and very fortunately it could assume a



Sarnak conjecture, understand with standard model

Sarnak conjecture is a conjecture lie in the overlap of dynamic system and number theory. It is mainly focus on understanding the behavior of entropy zero dynamic system by look at the correlation of an observable and the Mobius function .

We state it in a rigorous way:

let (X,T) be a entropy zero topological dynamic system. Let Mobius function be defined as \mu(n)=(-1)^t, where $latex$ is the number of different primes occur in the decomposition of n.

Then for any continuous function f:X\to R and x\in X, observable \xi(n)=f(T^n(x)) is orthogonal to the Mobius function; i.e. ,

\lim_{N\to \infty}\frac{1}{N}\sum_{n=0}^{N-1}\mu(n)\xi(n)=o(N).

I mainly focus on the special cases when dynamic system X is the skew product on T^2 and when the dynamic system which is a interval exchange in [0,1].

Skew product

For the first one, \Theta=(T,T^2),T:T^2\longrightarrow T^2 :
y_1(n)=T^{n}(x)=x+n\alpha,y_2(n)=T^n(y)=nx+\frac{n(n-1)}{2}\alpha+y+\sum_{n=1}^{N-1}h(x+i\alpha) , where c=1,-1.

by Bourgain-Ziegelar-Sarnak theorem we know the difficulties is focus on deal with the exponent

S_{p,q}(N)=\sum_{n=1}^N\mu(n)e^{\phi(n)+\sum_{m\in Z}e(mx)\hat H(m)(\frac{e(npm\alpha)-1}{e(m\alpha)-1}- \frac{e(nqm\alpha)-1}{e(m\alpha)-1})}

for all p,q is suffice large primes pair.

and a much simper case is the affine map:T:(x,y)\to (x+\alpha,cx+y+\beta) on \mathbb T^2 and the general case T:(x_1,...,x_n)\to A(x_1,...,x_n) where A is a upper-triangle matrix with diagonal 1; i.e. A=I+B, B is nilpotent. So the sarnak conjecture in this case is reduce to the Davenport estimate on exponent by B-Z-S theorem:

|\sum_{n=0}^{N}e^{2\pi if(n)}|\leq c_A\frac{N}{(log N)^A}, \forall A>0.

Interval exchange map

For the interval exchange map, we can explain it by a composition of rotation of some part of S_1 step by step and with a renormalization process to glue the neighbor rotations.

Now let us explain a little with this interesting dynamic system. We focus in the simplest nontrivial case, which is the 3-interval exchange map. In this case, just consider the permutation of intervals I_1,I_2,I_3, and it is easy to see there is only one case is nontrivial that is permutation: I_1\to I_3,I_2\to I_2,I_3\to I_1. We explain a little more with other trivial case:

When  I_1\to I_2,I_2\to I_3,I_3\to I_1, the interval exchange map is just a rotation and for which the sarnak conjecture is just come from:

|\sum_{n=0}^{N}e^{2\pi in\alpha}\mu(n)|=o(N), \forall \alpha\in R.

Which is trivial because \sum_{n=0}^{N}e^{2\pi in\alpha}\mu(n)=\frac{1-e^{2\pi iN\alpha}}{1-e^{2\pi i\alpha}}.

For the case $I_1\to I_2, I_2\to I_1, I_3\to i_3$ the map T is a rotation on I_1\cap I_2 but it is a identity map on I_3 and the orbits of point only lying one of $I_1\cap I_2, I_3$, lying in which one depend on the original point x we take is lying in which one.

Now we focus on the most difficult situation. It is annoying but it is the obstacle we must get over to go far. Fortunately it could be explained as in the following picture.

3-Interval exchange map as two rotation map glue with a renormalization map.


Now we explain what happen in the picture, it is mainly say one identity, which explain how to look 3-interval exchange map as a composition of rotation map with a renormalization map to glue them. Rotation is a kind of map we have good understanding but we do not understand very well with the renormalization map which is glue the two endpoints of I_2,I_3 which are not the common endpoint of them. Then you get two circle glue like a “8” , and T_2 is just rotate one of it and make the other one to be invariance.

Now we roughly could think about what is the thing we need to charge with, it is just:

\sum_{n=0}^{N}f((T_1\circ R\circ T_1)^n(x))\mu(n)=o(N).

Now we do some calculate with this geometric explain of interval exchange map.

Let A=I_1, B=I_2\cap I_3, then A\cap B=\emptyset, A\cup B=[0,1]. And |A|=\alpha, 0<\beta<|B|. the rotation T_1:x\to x-\alpha, T_2:x\to x+\beta.



Standard model

Is there a standard model of entropy zero dynamic system?

This problem seems to be too ambitious. But it occur naturally when I an trying to have a global understand of the Sarnak conjecture.


Baragar-Bourgain-Gamburd-Sarnak conjecture

M is the markov triple (x,y,z):

x^2+y^2+z^2=xyz and (x,y,x)\in \mathbb Z^3  \ \ \ \  (*).

It is easy to see:

R_1: (x,y,z)\to (3yz-x,y,z).

map markov triple to markov triple.

This is also true for R_2,R_3. and the transform R_1,R_2,R_3 and permutation a classical result of markov claim that all solution of  (*) could be generated from (1,1,1). I get a similar result for a similar algebraic equation 1 half years ago when consider a Q version of problem about 1-form given by Xu Bin.

Now  we know the graph with root (1,1,1) and with node generate by transform R_1\cup R_2 \cup R_3 \cup S_3 is connected.

The B-B-G-S conjecture is is the connected property still true for prime p surfficed  large?

Metric entropy 2

I am reading the article “ENTROPY THEORY OF GEODESIC FLOWS”.

Now we focus on the upper semi-continuouty of the metric entropy map. The object we investigate is (X,T,\mu), where \mu is a T-invariant measure.

The insight to make us interested to this kind of problem is a part of variational problem, something about the existence of certain object which combine a certain moduli space to make some quantity attain critical value(maximum or minimum). The most simple example maybe Isoperimetric inequality and Dirichlet principle of Laplace. Any way, to establish such a existence result a classical approach is to proof the upper semi-continuouty and bounded for associate energy of the problem. In our case the semi-continuouty will be some thin about the regularity of the entropy map:

E:M(X,T)\to h_{\mu}.

We define the entropy at infinity:

sup_{(\mu_n)}limsup_{\mu_n\to 0}h_{\mu_n}(T)

Where (u_n)_{n=1}^{\infty} varies in all sequences of measure coverage to 0 in the sense for all A\subset M, A measurable then \lim_{n\to \infty} \mu_{n}(A)=0.

Compact case

we say some thing about the compact case, In this case we have finite partition with smaller and smaller cubes, this could be understand as a sequences of smaller and smaller scales. A example to explain the differences is \mathbb N^{\mathbb N},\sigma, shift map on countable alphabet.

Because of this thing, there is a good sympolotic model, i.e.  h-expension, and it generalization  asymptotically  h-expension equipped on a compact metric space $X$ have been proved to be that the corresponding entropy map is upper semi-continous.

In particular C^{\infty} diffeomorphisms on compact manifold is asymptotically h-expensive.



Natural problem but I do not understand very well:

Why it is natural to assume the measure to be probability measure in the non-compact space?


Non-compact case

(X,d) metric space

T:X\longrightarrow X is a continuous map.

d_n(x,y)=\sup_{0\leq k\leq n-1}d(T^kx,T^ky), then d_{n} is still a metric.

Easy to see \frac{1}{n}h_{\mu}(T^n)=h_{\mu}(T). This identity could be proved by the cretition of entropy by \delta-seperate set and \delta-cover set.


Kapok theorem:

X compact, for every ergodic measure \mu the following formula hold:

h_{\mu}(T)=\lim_{\epsilon \to 0}limsup_{n\to \infty}\frac{1}{n}logN_{\mu}(n,\epsilon,\delta).

Where h_{\mu}(T) is the measure theoretic entropy of \mu.

Riquelme proved the same formula hold for Lipchitz maps on topological manifold.



Let M_e(X,T) defined the moduli space of T-invariant portability measure.

Let M_(X,T) defined the moduli space of ergodic T-invariant probability measure.

Simplified entropy formula:

(X,d,T) satisfied simplified entropy formula if \forall \epsilon >0 surfaced small and \forall \delta\in (0,1), \mu\in M _e(X,T).

h_{\mu}(T)=\limsup_{n\to \infty}\frac{1}{n}log(N_{\mu}(n,\epsilon,\delta)).

Simplified entropy inequality:

If \epsilon>0 suffciently small, \mu \in M_{e}(X,T), \delta\in (0,1).

h_{\mu}(T)\leq \limsup_{n\to \infty}\frac{1}{n}log(N_{\mu}(n,\epsilon,\delta)).

Weak entropy dense:

M_e(X,T) is weak entropy dense in M(X,T). \forall \lambda>0, \forall \mu\in M(X,T), \exists \mu_n\in M_e(X,T), satisfied:

  1. \mu_n\to \mu weakly.
  2. h_{\mu_n}(T)>h_{\mu}(T)-\lambda, \forall \lambda>0.

Metric entropy 1

Some basic thing, include the definition of metric entropy is introduced in my early blog.

Among the other thing, there is something we need to focus on:

1.Definition of metric entropy, and more general, topological entropy.

2.Spanning set and separating set describe of entropy.

3.amernov theorem:


Now we state the result of Margulis and Ruelle:

Let M be a compact riemannian manifold, f:M\to M is a diffeomorphism and \mu is a f-invariant measure.

Entropy is always bounded above by the sum of positive exponents;i.e.,

h_{m}(f)\leq \int_{i}\lambda_i^{+}(x)dimE_i(x)dm(x).

Where dimE_i(x) is the multiplicity of \lambda_i(x) and a^{+}=max(a,0).

Pesin show the inequality is in fact an equality if f\in C^2 and m is equivalent to the Riemannian measure on M. So this is also sometime known as Pesin’s formula.

F.Ledrappier and L.S.Young generate the result of Pesin.

One of their main result is:

f:M\to M is a C^2 diffemoephism, where M is a compact riemanian manifold, f is compatible with the Lesbegue measure on M, and


If and only if on the canonical defined quation manifold $M/W_{\mu}$, i.e. the manifold mod unstable manifold $W_{\mu}$, the induced conditional measure m_{\xi} is absolute continuous.

Remark: according to my understanding, the equality just mean in some sense we have the inverse estimate:

h_{m}(f,\mu)\geq \int_{M}\lambda_idim(V_i)dm.

This result maybe just mean near the fix point of f,i.e. the place charge the topology of the foliation, we have the inverse estimate. Such a inverse estimate will lead a control of the singularity of the push forward measure m_{\xi} on the quation manifold.  So m_{\xi} have good regularity. But this idea is not complete to solve the problem.

Now we begin to get a geometric explain and which will lead a rigorous proof of the inequality:

h_{m}(f)\leq \int_{i}\lambda_i^{+}(x)dimE_i(x)dm(x).

At first we could observe that the long time average \lim_{n\to \infty}\frac{1}{n}log||Df^n|| of Df could be diagonal. Assume after diagonal the eigenvalue is

\lambda_1\leq \lambda_2\leq \lambda_3\leq...\leq \lambda_{n-1}\leq \lambda_n.

This eigenvalue could divide into 3 parts: <0,=0,>0.

This will lead to a direct sum decomposition of the tangent bundle TM:

TM\simeq E_{u}\otimes E_s\otimes E_c.

Where $E_u$ is the part corresponding to the eigenvalue>0, For this part we consider the more refinement decomposition:

E_u=\otimes_{k=1}^rV_k, V_k is the eigenvector space of \lambda_k. The dimension of $V_k$ is $dim V_k$.

On the other hand, we have a equality of metric entropy:

h_{m}(f)=\frac{1}{n}h_{m}(f^n)=\sup_{\alpha\in partition \ set}\frac{1}{n}h_m(f^n,\alpha).

For the later one, \alpha is a measurable partition of M, then \alpha could always be refine to a smaller partition \beta, and we have:

h_{m}(f,\alpha)\leq h_{m}(f,\beta).

Now we arrive the central place of the proof:

every partition could be refine by a partition with boundary of almost all cubes is parallel to the foliation. So  we focus ourselves on the portion \beta and all boundary of cubes in \beta is parallel to the eigenvector.

Under this situation, we need only estimate the numbers of \vee_{i=1}^nT^i\beta. Estimate it is not very difficult. we need only observe the following two thing:


\lim{n\to \infty} exists a.e. in M. So this lead to the definition of foliation almost everywhere, and except a measurable zero set. In fact this set is the set of fix point of M under f.


After a rescaling, every point which is not a fix point of f could be understand as it is far away from fix points. Then the foliation could be understand as  a product space locally. The flow with the direction which the eigenvalue is less than 1 cold not change \vee_{i=1}^nT^i\beta. The direction with eigenvalue equal to 1 is just transition and just change the number of \vee_{i=1}^nT^i\beta with polynomial growth. But the central thing is the direction with eigenvalue large than one and will make \vee_{i=1}^nT^i\beta change with viscosity e^{\lambda_i}. and we product it and get :

h_{m}(f,\mu)\leq \int_{M}\lambda_i dim(V_i)dm.

In fact the proof only need f to be C^1