Автор: Knapp A.W.  

Теги: mathematics   cryptography   natural sciences   elliptic curves  

ISBN: 0-691-08559-5

Год: 1992

Текст
                    ELLIPTIC CURVES
BY
ANTHONY W. KNAPP
MATHEMATICAL NOTES
PRINCETON UNIVERSITY PRESS


Elliptic Curves by Anthony W. Knapp Mathematical Notes 40 Princeton University Press Princeton, New Jersey 1992
Copyright © 1992 by Princeton University Press ALL RIGHTS RESERVED The Princeton Mathematical Notes are edited by Luis A. Caffarelli, John N. Mather, and Elias M. Stein Princeton University Press books are printed on acid-free paper, and meet the guidelines for permanence and durability of the Committee on Production Guidelines for Book Longevity of the Council on Library Resources Printed in the United States of America Library of Congress Cataloging-in-Publication Data Knapp, Anthony W. Elliptic curves / by Anthony W. Knapp. p. cm.—(Mathematical notes ; 40) Includes bibliographical references and index. ISBN 0-691-08559-5 (PB) 1. Curves, Elliptic. I. Title. II. Series. Mathematical notes (Princeton University Press); 40. QA567.2.E44K53 1993 516.3'52--<Jc20 92-22183 3 5 79 10 8642
To Susan
CONTENTS List of Figures List of Tables Preface Standard Notation I. Overview II. Curves in Projective Space 1. 2. 3. 4. 5. Projective Space Curves and Tangents Flexes Application to Cubics Bezout's Theorem and Resultants III. Cubic Curves in Weierstrass Form 1. 2. 3. 4. 5. Examples Weierstrass Form, Discriminant, j-invariant Group Law Computations with the Group Law Singular Points IV. Mordell's Theorem 1. 2. 3. 4. 5. 6. 7. 8. 9. Descent Condition for Divisibility by 2 E(Q)/2£(Q), Special Case £(Q)/2£(Q), General Case Height and MordelPs Theorem Geometric Formula for Rank Upper Bound on the Rank Construction of Points in E(Q) Appendix on Algebraic Number Theory V. Torsion Subgroup of £(Q) 1. 2. 3. 4. 5. 6. Overview Reduction Modulo p p-adic Filtration Lutz-Nagell Theorem Construction of Curves with Prescribed Torsion Torsion Groups for Special Curves X X xi xv 19 24 32 40 44 50 56 67 74 77 80 85 88 92 95 102 107 115 122 130 134 137 144 145 148
vin CONTENTS VI. Complex Points 1. Overview 151 2. Elliptic Functions 152 3. Weierstrass p Function 153 4. Effect on Addition 162 5. Overview of Inversion Problem 165 6. Analytic Continuation 166 7. Riemann Surface of the Integrand 169 8. An Elliptic Integral 174 9. Computability of the Correspondence 183 VII. Dirichlet >s Theorem 1. Motivation 189 2. Dirichlet Series and Euler Products 192 3. Fourier Analysis on Finite Abelian Groups 199 4. Proof of Dirichlet's Theorem 201 5. Analytic Properties of Dirichlet L Functions 207 VIII. Modular Forms for ST(2,Z) 1. Overview 221 2. Definitions and Examples 222 3. Geometry of the q Expansion 227 4. Dimensions of Spaces of Modular Forms 231 5. L Function of a Cusp Form 238 6. Petersson Inner Product 241 7. Hecke Operators 242 8. Interaction with Petersson Inner Product 250 IX. Modular Forms for Hecke Subgroups 1. Hecke Subgroups 256 2. Modular and Cusp Forms 261 3. Examples of Modular Forms 265 4. L Function of a Cusp Form 267 5. Dimensions of Spaces of Cusp Forms 271 6. Hecke Operators 273 7. Oldforms and Newforms 283 X. L Function of an Elliptic Curve 1. Global Minimal Weierstrass Equations 290 2. Zeta Functions and L Functions 294 3. Hasse's Theorem 296
CONTENTS IX XI. Eichler-Shimura Theory 1. Overview 302 2. Riemann surface Xq(N) 311 3. Meromorphic Differentials 312 4. Properties of Compact Riemann Surfaces 316 5. Hecke Operators on Integral Homology 320 6. Modular Function j(r) 333 7. Varieties and Curves 341 8. Canonical Model of X0(N) 349 9. Abstract Elliptic Curves and Isogenics 359 10. Abelian Varieties and Jacobian Variety 367 11. Elliptic Curves Constructed from S2{r0{N)) 374 12. Match of L Functions 383 XII. Taniyama-Weil Conjecture 1. Relationships among Conjectures 386 2. Strong Weil Curves and Twists 392 3. Computations of Equations of Weil Curves 394 4. Connection with Fermat's Last Theorem 397 Notes 401 References 409 Index of Notation 419 Index 423
LIST OF FIGURES 1.1 Method for obtaining Q solutions of x2 -f y2 — 1 6 1.2 Newton's explanation of the Diophantus method 10 1.3 Chord-tangent composition rule 11 1.4 Group laws relative to different base points O 11 1.5 Negatives relative to the group law 12 1.6 Singular behavior 13 1.7 Graphs of R points of the elliptic curve y2 = P(x) 14 3.1 Configuration for (PP')(QQf) = (PQ)(P'Q') 69 8.1 Fundamental domain for 5L(2, Z) 228 8.2 Contour for calculating number of zeros 232 9.1 Fundamental domain for r0(2) 260 LIST OF TABLES 1.1 Values of H'p<R N{p)/p for y2 = x3 + ax 17 3.1 Some elliptic curves with small negative discriminant 64 3.2 Some elliptic curves with small positive discriminant 65 4.1 Image of <p for y2 = x3 - Ax with x £ {-2,0,2} 111 4.2 Image of <p for y2 = x3 - 4x with x G {-2,0,2} 111 4.3 Adjusted image of <p for y2 = x3 - Ax 111 4.4 Image of ip for y2 = x3 — p2x with x ^ {—p,0,p} 113 4.5 Adjusted image of <p for y2 = x3 — p2x 113 4.6 Some primes p = 5 mod 8 that are congruent numbers 117 4.7 Effect on E : y2 = x3 + 8x of Fermat's descent 121 5.1 Examples of torsion subgroups of E(Q) 133 10.1 Effect of an admissible change of variables 291 12.1 Conductors of some elliptic curves 391 12.2 Curves X0(N) of low genus 395
PREFACE For introductory purposes, an elliptic curve over the rationals is an equation y2 = P(x)> where P is a monic polynomial of degree three with rational coefficients and with distinct complex roots. The points on such a curve, together with a point at infinity, form an abelian group under a geometric definition of addition. Namely if we take two points on the curve and connect them by a line, the line will intersect the curve in a third point. The reflection of that third point in the x-axis is taken as the sum of the given points. The identity is the point at infinity. According to Mordeirs Theorem, the abelian group of points on the curve with rational coordinates is finitely generated. A theorem of Lutz and Nagell describes the torsion subgroup completely, but the rank of the free abelian part is as yet not fully understood. This is the essence of the basic theory of rational elliptic curves. The first five of the twelve chapters of this book give an account of this theory, together with many examples and number-theoretic applications. This is beautiful mathematics, of interest to people in many fields. Except for one small part of the proof of Mordeirs Theorem, it is elementary, requiring only undergraduate mathematics. Accordingly the presentation avoids most of the machinery of algebraic geometry. A related theory concerns elliptic curves over the complex numbers, or Riemann surfaces of genus one. This subject requires complex variable theory and is discussed in Chapter VI. It leads naturally to the topic of modular forms, which is the subject of Chapters VIII and IX. But the book is really about something deeper, the twentieth-century discovery of a remarkable connection between automorphy and arithmetic algebraic geometry. This connection first shows up in the coincidence of L functions that arise from some very special modular forms ("automorphic" L functions) with L functions that arise from number theory ("arithmetic" or "geometric" L functions, also called "motivic"). Chapter VII introduces this theme. The automorphic L functions have manageable analytic properties, while the arithmetic L functions encode subtle number-theoretic information. The fact that the arithmetic L functions are automorphic enables one to bring a great deal of mathematics to bear on extracting the number-theoretic information from the L function. The prototype for this phenomenon is the Riemann zeta function £(s), which should be considered as an arithmetic L function defined initially xi
Xll PREFACE for Re s > 1. An example of subtle number-theoretic information that £(s) encodes is the Prime Number Theorem, which follows from the nanvanishing of £(s) for Re s = 1. In particular, this property of £(s) is a property of points s outside the initial domain of C(5)- To get a handle on analytic properties of C(5)> one proves that £(s) has an analytic continuation and a functional equation. These properties are completely formal once one establishes a relationship between £(s) and a theta function with known transformation properties. Establishing this relationship is the same as proving that £(s) is an automorphic L function. The main examples of Chapter VII are the Dirichlet L functions L(S)X)- These too are arithmetic L functions defined initially for Re s > 1. They encode Dirichlet's Theorem on primes in arithmetic progressions, which follows from the nonvanishing of all L(s> \) at s = 1. As with C(s)> tne relevant properties of L(s,\') ar€ outside the initial domain. Also as with C(5)> °ne gets at the analytic continuation and functional equation of L(stx) by identifying L(s>x) with an automorphic L function. The examples at the level of Chapter VII are fairly easy. Further examples, generalizing the Dirichlet L functions in a natural way, arise in abelian class field theory, are well understood even if not easy, and will not be discussed in this book. The simplest L functions that are not well understood come from elliptic curves. An elliptic curve has a geometric L function L(s,E) initially defined for Re s > |. An example conjecturally of the subtle information that L(s> E) encodes is the rank of the free abelian group of rational points on the curve. This rank is believed to be the order of vanishing of L(s,E) at s = 1. Once again, the relevant property of L(s, E) is outside the initial domain. To address the necessary analytic continuation, one would like to know that L(s, E) is an automorphic L function. Work of Eichler and Shimura provides a clue where to look for such a relationship. Eichler and Shimura gave a construction for passing from certain cusp forms of weight two for Hecke subgroups of the modular group to rational elliptic curves. Under this construction, the L function of the cusp form (which is an automorphic L function) equals the L function of the elliptic curve. The Taniyama- Weil Conjecture expects conversely that every elliptic curve arises from this construction, followed by a relatively simple map between elliptic curves. This conjecture appears to be very deep; a theorem of Frey, Serre, and Ribet says that it implies Fermat's Last Theorem. The final three chapters discuss these matters; the last two take for granted more mathematics than do the earlier chapters. If the theme were continued beyond the twelve chapters that are here,
PREFACE xni eventually it would lead to the Langlands program, which brings in representation theory on the automorphic side of this correspondence. As a representation theorist, I come to elliptic curves from the point of view of the Langlands program. Although the book neither uses nor develops any representation theory, elliptic curves do give the simplest case of the program where the correspondence of L functions is not completely understood. Furthermore representation-theoretic methods occasionally yield results about elliptic curves that seem inaccessible by classical methods. From my point of view, they are an appropriate place to begin to study and appreciate the Langlands program. A beginning guide to the literature in this area appears in the section of Notes at the end of the book. This book grew out of a brilliant series of a half dozen lectures by Don Zagier at the Tata Institute of Fundamental Research in Bombay in January 1988. The book incorporates notes from parts of courses that I gave at SUNY Stony Brook in Spring 1989 and Spring 1990. The organization owes a great deal to Zagier's lectures, and I have reproduced a number of illuminating examples of his. I am indebted to Zagier for offering his series of lectures. Much of the mathematics here can already be found in other books, even if it has not been assembled in quite this way. Some sections of this book follow sections of other books rather closely. Notable among these other books are Fulton [1989],* Hartshorne [1977], Husemoller [1987], Lang [1976] and [1987], Ogg [1969a], Serre [1973a], Shimura [1971a], Silverman [1986], and Walker [1950]. The expository paper Swinnerton- Dyer and Birch [1975] was also especially helpful. Detailed acknowledgments of these dependences may be found in the section of Notes at the end. In addition, I would like to thank the following people for help in various ways, some large and some small: H. Farkas, N. Katz, S. Kudla, R. P. Langlands, S. Lichtenbaum, H. Maturnoto, C.-H. Sah, V. Schechtman, and R. Stingley. The type-setting was by A/viS-T^K, and the Figures were drawn with Mathematical. Financial support in part was from the National Science Foundation in the form of grants DMS 87-23046 and DMS 91-00367. A. W. Knapp January, 1992 *A name followed by a bracketed year is an allusion to the list of References at the end of the book. The date is followed by a letter in case of ambiguity.
STANDARD NOTATION Item Meaning #5 or |5| 0 Ac n positive Z, Q, R, C Re* Im z 0(1) o(l) = r**t Im a = b mod m a\b GCD(a,6) 1 1 or / dimV V Endfc(K) GL{n,k) 5L(2,R), 5L(2,Z) Tr,4 >itr /2X k [A:B] Autk(K) Ee »i number of elements in S empty set A complement n > 0 integers, rationals, reals, complex numbers real part of z imaginary part of z bounded term term tending to 0 approximately numerically equal asymptotic to, with ratio tending to 1 integers modulo m m divides a — b a divides 6 greatest common divisor multiplicative identity identity matrix dimension of vector space dual of vector space linear maps of vector space to itself general linear group over a field k group of 2-by-2 matrices of determinant 1 trace of A transpose of A multiplicative group of invertible elements algebraic closure of k index of B in A, or degree of A in B automorphism group of K fixing k direct sum (for emphasis) fundamental group
Elliptic Curves
CHAPTER I OVERVIEW Diophantus lived in Alexandria around 250 A.D. He published a series of books, Arithmetica, in 13 volumes, which were lost for more than a thousand years. They were found again about 1570, and in the next century Fermat studied a translation. Despite the passage of so much time, much of Diophantus Js work had still not been rediscovered. The books are in the style of problems and solutions. The early volumes introduced, apparently for the first time, algebraic notation and equations, as well as negative numbers. Later volumes dealt with number theory. The work of Diophantus is so stunning that equations to be solved in number theory are often called Diophantine equations in his honor. Basic Problem (affine). We consider the locus f(x}y) = 0 with / a nonzero polynomial. To fix the ideas, think of Q coefficients and of solutions with x and y in Q. Clearing denominators, we can always adjust / so that we have Z coefficients and are seeking Q solutions, i.e., solutions with x and y in Q. Basic Problem (projective). We study the locus F(xyy}w) = 0 with F a nonzero homogeneous polynomial of some degree </, and we seek "projective" Q solutions. This means we identify solutions (x*, ?/, w) and (Ax,Aj/, \w) for A / 0 and we discard (0,0,0). Rational solutions automatically give us integer solutions, by clearing denominators. The two problems are related, in that we can pass from each to the other. Two examples will illustrate. EXAMPLE 1. Fermat equation xd + yd = zd. This equation is projective. We can reduce to the affine case by dividing by zd: ud + vd = 1. Relatively prime integer solutions of xd -f yd — zd (with z ^ 0) correspond to rational solutions of ud + vd — 1 provided that we identify a solution (x,t/, z) with its negative (—xy— j/, — z). This process of reducing to the affine case loses "affine solutions at oo," those with 2 = 0, but they are trivial here. Fermat's Last Theorem is the conjecture that the equation has no nontrivial solutions for d > 2. 3
4 I: OVERVIEW Example 2. The affine equation y2 = x3 + 1. This becomes a projective equation if we put y = ^ and x = ^. We get wv2 = u3 +w3. (Or we simply insert powers of w to make all terms cubic: wy2 = x3 + w3.) Diophantus considered the affine problem in degrees 1, 2, and 3. Case of degree 1. An affine line is of the form ax + by + cz = 0 with a and 6 not both 0. Two such lines are the same if and only if the coefficients of one are a multiple of the coefficients of the other. A projective line is of the form ax + by -f cw = 0 with a, 6, c not all 0. Again, two such lines are the same if and only if the coefficients of one are a multiple of the coefficients of the other. The line with a = b = 0 and c = 1 is w = 0, which is the "line at oo." Two facts are clear: 1) Q solutions exist (projectively). 2) We can parametrize all Q solutions. Case of degree 2. First we consider this case projectively. Then it turns out to be possible to make a linear change of variables so that the equation is ax2 + by2 + cw2 = 0. (1.1) The questions we shall address are: (I) Do (nonzero) solutions exist? (II) If solutions exist, how are they parametrized? Of these, only the second question was considered by Diophantus. We shall return to the second question shortly. We begin with a discussion of Question I, first giving two examples. Example 1. x2 + y2 -f z2 — 0 has no Q solutions since it has no R solutions. Example 2. x2-\-y2 = 3z2 has no Q solutions since it has no solutions modulo 9 in which some variable is not divisible by 3. This condition is necessary since existence of a Q solution implies existence of a relatively prime Z solution. To see that the condition fails, we argue as follows: Modulo 3, the equation is x2 + y2 = 0, which forces x = y = 0 mod 3. Then 32 divides x2 -f y2 and so 3 divides z. Thus 3 divides all of x, y, and z, a contradiction.
I: OVERVIEW 5 More generally, suppose that a,6,c are relatively prime integers and no square divides a coefficient. Then two necessary conditions for the existence of solutions of (1.1) can be seen to be that (1) (1.1) must have a nonzero solution in R (2) (1.1) must have a relatively prime solution modulo pm for each prime p and each integer m > 0. It turns out that (2) is automatic for all m if p is odd and does not divide abc. If p = 2 or p divides a6c, then the weaker necessary condition (3a) or (3b) below implies (2) for the corresponding p: (3a) when an odd prime p divides abc, (1.1) must have a relatively prime solution modulo p (3b) when p = 2, (1.1) must have a relatively prime solution modulo 16. Theorem 1.1 (Legendre). If (1) holds and if (3) holds for p = 2 and for all odd primes p dividing abc> then (1.1) has a Q solution. Remarks: i) (1) and (3) are decidable. So we can decide the existence question, Question I. (And, parenthetically, the hypothesis on p = 2 in the theorem is not needed.) ii) To fix the ideas, think of (1) and (2) as the conditions. The Hasse- Minkowski theorem generalizes the Legendre theorem to more variables: A homogeneous quadratic polynomial in n variables with integer coeff- cients has a nonzero solution in Q if and only if it has a nonzero solution in R and a relatively prime solution modulo pm for each prime and each integer m > 0. iii) The field of p-adic numbers, described below in §V.3, can be used to combine congruence information modulo pm for all m. With this device, the Hasse-Minkowski theorem says: A homogeneous quadratic polynomial in n variables with integer coefficients has a nonzero solution in Q if and only if it has a nonzero solution in R and in the p-adic numbers for every prime p. The idea that solutions over R and the p-adics ought to control solutions over Q is called the Hasse Principle. Unfortunately it is not universally true. As we shall see shortly, it fails even for homogeneous cubics in three variables. Nevertheless, the Hasse Principle is a useful place to begin the study of a class of projective equations over Q. We shall see better how the principle operates in Chapters V and VII. In the language of algebraic number theory, the p-adics and R are indexed by "places." A narrow (but still false) form of the Hasse Principle
6 I: OVERVIEW then says: If a projective equation is locally solvable at every place, then it is globally solvable. This terminology is supposed to suggest an analogy with what happens in the theory of several complex variables. In that theory, we consider holomorphic functions and make a ring out of those near a point (technically by passing to germs). The ring that we make is a "local ring" in the sense that it has a unique maximal ideal. The global ring is the ring of all globally holomorphic functions on a given open set. In number theory, we attempt the same thing. Our global number field is Q. To this we attach "places," analogous to points, by an abstract definition that we shall not pursue. The places work out to be the primes p and also oo. For each place, there is a "local field" - the p-adics and the reals. The sense in which the p-adics are a "local field" is that the p-adic integers form a "local ring," a ring with a unique maximal ideal. Local solvability is to be solvability in the local field for each place. The hope expressed in the Hasse Principle is that we can obtain global solutions from local ones, because finding local ones is easier. We turn to a discussion of Question II, the parametrization of solutions. Let us begin with Diophantus's method to solve x2 + y2 = 1. Take a particular solution (—1,0). Choose a linear relation it satisfies, say x = -1 + 2y. FIGURE 1.1. Method for obtaining Q solutions of x2 + y2 = 1 Substitution into x2 -f y2 — 1 gives (—1 + 2y)2 + y2 = 1, and we are led to 1 - 4y + Ay2 + y2 = l,5y - 4 = 0,y = f ,x = §. We obtain
I: OVERVIEW 7 (r,y) = (§, if) as a nontrivial solution. Effectively x = -1 + ty works for any rational t. We are led to <*•»>= (fefpfr)- (,2> Conversely any (z,y) must make a line with (-1,0) with ^ = some t, and (xyy) is forced to be of this form. The actual Diophantus method did not make reference to any geometry such as is Figure 1.1. The correspondence between algebra and geometry was unknown before Fermat and Descartes in the 17th century. Diophantus used this analysis to parametrize Pythagorean triples: The problem is to find all relatively prime solutions of a2 -f b2 — c2 over Z. For such a solution (a, 6, c) with a ^ 0, ^ _ 6/c _ 2t/(t2 -f 1) 2t 2m/n _ 2mn a " 'afc" (i2 - l)/(t2 + 1) " <2- 1 " (m/n)2 - 1 ~ ^"^n1" So (a, 6, ±c) = K(m2 - n2,2mn, m2 -f n2) with /v 6 Q. Below we shall tidy matters up and come to the following conclusion. Theorem 1.2 (Diophantus). In any Pythagorean triple (a,6,c), exactly one of a and 6 is odd. If a is odd and c is positive, then there exist integers m and n such that (a, 6, c) = On2 - n2,2mn, m2 + n2). (1.3) The integers m and n are relatively prime and not both odd. Conversely any pair (m}n) with these properties yields a Pythagorean triple via (1.3). PROOF. The given equation taken modulo 4 shows that a and 6 cannot both be odd. Thus suppose a is odd and 6 is even, and write as above (a,6,±c) = -(m2- n2,2mn,m2 + n2) with u/v in lowest terms. Clearing fractions, we see that u divides a, 6, and c. Hence u = ±1 and we may take u = +1. We now have v(a> 6, ±c) = (ra2 — n2,2?nn, m2 -f ?z2). We can eliminate the ambiguous sign by redefining (m,n,i;), leaving it alone if the sign is + or replacing it by ( — n,?7i,— v) if the sign is —. Then we obtain v(a,btc) = (m2 - n2,2?nn,?n2 + n2). {IA)
8 I: OVERVIEW Since c > 0, v > 0. Ifmandn have any common factor, the square of the factor will appear in v, and we can cancel. Thus we may assume that m and n are relatively prime. If p is an odd prime dividing v, then p divides both m2 — n2 and m2 + n2, hence both 2m2 and 2n2, hence both m and n. Thus no odd prime divides v. If 2 divides v, then the evenness of vb implies that 4 divides 2mn. Thus one of m or n must be even, and then the other must be odd. Hence m2 — n2 is odd, and va is odd, contradiction. Thus v = 1, and (1.4) reduces to (1.3). In (1.3) it is clear that m and n are not both odd. Conversely, such a relatively prime pair (mtn) trivially yields a Pythagorean triple. The method of Diophantus readily generalizes. Take f(xty) = 0 to be any quadratic in (x,y). Suppose (p, q) is a solution. Then y = q+t(x—p) and x — p give all lines through (p, q). Substitute for y. Get a quadratic equation for x with a rational solution (for each /), namely x = p. Then the other solution must be rational, etc. Without even carrying out the computations, we can draw an important qualitative conclusion: Proposition 1.3. If a quadratic equation f(xyy) = 0 over Q has one Q solution, it has infinitely many Q solutions. Exercise: Parametrize the relatively prime integer solutions (a,6,c) of2a2 + 62 = c2. Case of degree 3. First we consider this case protectively. There is no normal form in this generality. We shall address the same questions as in the case of degree 2: (I) Do Q solutions exist? (II) If Q solutions exist, how are they parametrized? The second of these questions was considered briefly by Diophantus. The following example shows for Question I that the Hasse Principle fails. EXAMPLE (Selmer): The projective equation 3x3 + 4y3+5z3 = 0 has a solution over R and over the />-adics for each py but it has no Q solution. Now let us discuss Question II. We begin by normalizing our cubic suitably. Taking Question I as settled, suppose that the homogeneous cubic F(x,yyw) = 0 has a Q solution. Suppose in particular that it
I: OVERVIEW 9 has an inflection point over Q in a sense that will be made precise in §11.3. If this inflection point is mapped to (x, y} w) = (0,1,0) by a linear transformation in such a way that the tangent line becomes the line at oo, and if the variables are scaled suitably, then the equation (in affine form) becomes y2 + aixy -+ a3j/ = x3 + a2x2 + a4x + a6. (Notice that (x,y,w) = (0,1,0) does solve this equation projectively.) Since Q does not have characteristic 2, we can complete the square in y and reduce matters to y2 = P(x), where degP = 3. Example: u3 + j/3 = w3. Put £ = 3f} ± = *=i. The inverse is 3u 9w x = , t/= . w — v w — v Then the equation becomes y2 - 9y = x3 - 27. Anyway we have a solution (the one at oo), and we want to know all solutions. Diophantus knew how to generate some solutions from others. EXAMPLE: Given a number (e.g., 6), divide it into two pieces whose product is a cube diminished by its root. Solution: Let the pieces be y and 6 — y. We want 2/(6 - y) = x3 - x. The point (xyy) = ( — 1,0) is a trivial solution. It lies on a line x = 2y—1. Substitution gives 6y - y2 = (2t/ - l)3 - (2y - 1) = 8y3 - \2y2 + 4y. Division by y gives a quadratic equation with irrational roots. We would surely have had rational roots if we could have made Ay on the right match 6y on the left. The 4 comes from twice the coefficient 2 of y. To get 6y, we should use coefficient 3 for y. So x = 3y — 1. Then 6y - y2 = 27y3 - 27y2 + 6y, 27y = 26.
10 I; OVERVIEW Soy =ff and* = 3y-l = ^. Newton's explanation of the method is as follows: Start with any line through (—1,0). Rotate it to be tangent. Then it intersects the given cubic twice at rational points; so the third intersection point must be rational. The picture in Figure 1.2 in the context of solutions over R makes matters much clearer. Figure 1.2. Newton's explanation of the Diophantus method In summary, for this (and other) cubic equations, we have a systematic method to generate new solutions from old. The method does not give parametric families. As with tz3-f v3 = w3 we may not even get infinitely many solutions this way. Let us analyze the method of Diophantus further, postponing detailed verifications to Chapters II and III. The above construction takes a rational solution P and produces another one, denoted PP or P • P, by means of a tangent line. More generally, if P and Q are distinct rational solutions as in Figure 1.3, so is R. We write R — PQ = P • Q. We have
I: OVERVIEW 11 a systematic way of getting a third point from two. This operation is called the chord-tangent composition rule. Figure 1.3. Chord-tangent composition rule (a) Picture for O = (0,1,0) (b) Picture for a different O FIGURE 1.4. Group laws relative to different base points O The chord-tangent composition rule is not applicable to certain pairs of points for a general cubic. But with an additional assumption of nonsingularity that we consider below, the rule is everywhere applicable and can be modified so as to give us an abelian group operation, as in Figure 1.4. For this purpose fix a point O on the curve. If (xyyyw) = (0,1,0) is on the curve (i.e., if there are no y3 terms), one frequently takes O = (0,1,0). With O fixed, we define P + Q = OPQ.
12 I: OVERVIEW Let us address the group axioms. Associativity of + is complicated and is postponed to Chapter III. But commutativity is clear. Also O is an identity since P + O = O • PO = P. To define negatives, form OO as the third point on the line tangent at O. Then -P = OO P. In fact, p + (-P) = O • P(-P) = 000 = 0. The geometry of negatives is especially nice if (0,1,0) is on the curve (i.e., again, if there are no j/3 terms). Then it turns out that the choice O = (0,1,0) leads to OO = O if there are no xy2 and x2y terms. Comparative pictures of negatives are in Figure 1.5. (a) Picture for O = (0,1,0) (b) Picture for a different O FIGURE 1.5. Negatives relative to the group law As we indicated above, there are difficulties if the curve is "singular." Figure 1.6 illustrates two types of singular behavior. In each case, P = (0,0), and we consider P + P. In (1), we have to take PP = P, and then P + P = OP is undefined since the line through O and P neither meets the curve at a third point nor is tangent to the curve at O or P. In (2), we do not have a unique definition of PP, since there is not a unique tangent line at P. The problem is that both curves look bad at P = (0,0). We cannot solve for y or x in terms of the other. The condition of the Implicit Function Theorem fails at (0,0) in that ^ = 0 = -§-. We say that (0,0) is a singular point and that the curve is singular. A precise definition of singularities will be given in §11.2.
I: OVERVIEW 13 (a) y2 = x3 (b) y2 = x2(x + l) Figure 1.6. Singular behavior Theorem 1.4 (Poincare). For a nonsingular cubic curve with a specified rational point O, the operation + is well defined and makes the set of rational solutions into an abelian group with identity element O. If a different identity element O' is chosen, then the two operations are related by P+'Q = P + Q — 0'y and the group structures are isomorphic. The geometry is especially simple when O is the same as OO, i.e., when O is an "inflection point." In this case we shall see that we can map O out to (x,y, w) = (0,1,0) by a linear mapping with coefficients in Q and we can arrange for the tangent line to be w = 0. After scaling, we get y2 + aixy + a3y = x3 + a2x2 + a4x + a6, which is called a Weierstrass form of the curve. An elliptic curve over Q is a nonsingular cubic curve in Weierstrass form, with rational coefficients. Wider classes of curves can be reduced to elliptic curves in various ways, and elliptic curves are sometimes defined as one of these more general curves. To visualize an elliptic curve over Q we can complete the square in the Weierstrass form and change variables in y to obtain y2 = P(x), where P(x) is a monic cubic with distinct roots. We can graph u = P(x) and obtain a picture as in Figure 1.7a or 1.7b. (Alternatively the graph might have its dip completely below the x-axis, or it might have no maximum or minimum.) Then we can scale by y = ±y/u and obtain the pictures in Figures 1.7c and 1.7d.
14 I: OVERVIEW (a) u = i(x2-l) (b) u = x{x7-l) + \ (c) y2 = x(x2 - 1) (d) y2 = x(x2 - 1) + \ Figure 1.7. Graphs of R points of the elliptic curve y2 = P(x) These pictures are of the R points in the affine plane. Projectively we must include the point at infinity, (0,1,0). Then topologically (and smoothly) we get one or two circles. This turns out to mean that the group of R points is S1 or S1®Z2- So the group of Q points is asubgroup of Sl or 51 ® Z2. The first main structure theorem about the group of Q points is as follows. Theorem 1.5 (Mordell). The group of Q points of an elliptic curve over Q is finitely generated. Hence it is Zr <£ F with F finite abelian. This theorem will be proved in Chapter IV. Since the finite abelian group F is contained in Sl or Sl 0 Z2, we must have F = Zn or F = Z2n®Z2. For any particular curve, F can be determined. We can change variables to make the curve be y2 = x3 + Ax + B, with A> B £ Z. It
I: OVERVIEW 15 simplifies calculations to take O = (0,1,0). Then the elements of order 2 are those with y = 0, at most three in number. For the rest we can use Theorem 1.6 (Lutz-Nagell). For an elliptic curve y2 = x3 -f Ax -f B over Q with O = (0,1,0), the finite abelian group F can be computed as follows. If P = (x(P)yy(P)) is in F but does not have order 1 or 2, then (a) x(P) and y(P) are in Z (b) y(P)2 divides 4A3 + 27 J32. This theorem will be proved in Chapter V. To compute F, we simply can try all integers y whose square divides 4A3 -f TIB2. For each such, we seek an integer solution x of y2 = x3 -f Ax + 5. We can tabulate all integer pairs (x,y) obtained in this way, and we have a bound for the order |F|. Then we can check which elements (x,y) are of finite order since their orders (if finite) cannot exceed |^|. As the elliptic curve varies, we can ask what finite abelian groups F arise in Mordell's Theorem. The following deep theorem addresses this question. Theorem 1.7 (Mazur). The torsion subgroup of the Q points of an elliptic curve over Q is one of the 15 groups Zn with n= 1,2,3,...,9,10,12 Z2n0Z2 with n = 1,2,3,4. We turn to a discussion of the rank r of the group of Q points on an elliptic curve over Q. Elliptic curves are known for which r is any of the numbers 0,1,2,..., 12. The proof of MordelFs Theorem gives an upper bound for r for any particular curve. Still, no effective way is known for deciding whether r = 0 for a particular curve, and it is not known whether r can be arbitrarily large. The motivation for what affects rank is based on computer calculations and on the hope expressed in the Hasse Principle. Let us consider an elliptic curve E given by y2 = F(x)} (1.5) where F(x) is a cubic polynomial with integer coefficients and with distinct roots 7*i,r2,r3 in C. The discriminant of F is flio'C7** "~ ri)2 anc* is a nonzero integer. The group of Q points is denoted E(Q).
16 I: OVERVIEW Guided by the Hasse principle, we consider (1.5) modulo a prime p. The curve (1.5) over Zp will be nonsingular (hence will be an elliptic curve) except when p = 2 or when p divides the discriminant of F. For nonexceptional p, we let N(p) be the number of Zp solutions (x,y), including the point at infinity. For example, y2 — x3 -f 3x over Z5 has solutions (0,0) (1,±2), (2, ±2), (3,±1), (4,±1), oo, and thus N(5) = 10. In (1.5), we always have 1 < W(p)<2p + 1. (1.6) In fact, the lower bound is a consequence of counting the point at infinity. For the upper bound, we note that each x from 0 to p— 1 gives at most 2 values of t/, and we also must count the point at infinity. We can rewrite (1.6) as \P+1-N(P)\<P. (1-7) So N(p) is centered about p-f 1. A theorem of Hasse says that it is more closely centered about p-f 1 than (1.7) indicates. Theorem 1.8 (Hasse). |p+ 1 - N(p)\ < 2^/p. The philosophy of the Hasse Principle is that local solutions should influence global solutions. So a curve is more likely to have many global solutions if N(p) is relatively large for many p. In Table 1.1, we consider the equation y2 = x3 + ax for several choices of the integer a. For each a, we form the product p<R as a function of R. The symbol ; means that the exceptional primes (2 and those dividing the discriminant) are not to be included. The table lists the values of this product for some Rys in approximately a geometric progression.
I: OVERVIEW 17 a -3 -4 +4 +6 -8 +3 -6 +8 -14 +14 -82 r 0 0 0 0 0 1 1 1 1 2 3 R= 17 .469 1.926 1.540 .451 .602 5.867 1.625 5.416 1.095 5.476 7.823 31 .624 1.506 2.409 .706 .612 5.964 1.270 7.199 1.456 5.566 12.234 53 .576 1.533 1.603 .477 1.034 5.271 2.807 7.338 1.230 10.491 22.167 97 .698 1.139 1.649 .384 1.584 8.855 3.342 7.600 1.532 18.097 41.142 173 1.030 1.820 2.081 .331 1.004 10.131 5.229 12.176 2.221 24.192 53.847 313 .870 1.433 2.732 .459 .838 8.937 4.891 9.610 3.077 22.829 106.282 557 .824 1.466 1.833 .712 .819 10.500 4.963 9.569 3.689 27.771 124.512 997 1 .550 1.411 2.240 .846 .852 14.159 5.194 14.060 4.204 29.577 134.117 Table 1.1 Values of YI'p<rN(p)/p for various R for the curve j/2 = x3 + ax What we can see from Table 1.1 is that l\'p<R N(p)/p is roughly constant in x for the cases that E(Q) has rank 0 and that it increases for the cases that £(Q) has rank > 0. Moreover, the rate of increase is distinctly larger when E(Q) has larger rank. More extensive calculation with many elliptic curves leads to the following first form of a conjecture. Conjecture 1.9 (Rough form of Birch and Swinnerton-Dyer Conjecture). For any elliptic curve E over Q, the rank r of £(Q) is given by the formula Y[' ^M ~(const)(logfl)r. p<R P Again N(p) is the number of Zp solutions (x,y), including the point at infinity, and the finite set of primes for which the reduction modulo p is singular are excluded from the product. To get a formulation of the conjecture that fits better into an established framework, one works with the L-series of the elliptic curve, which for current purposes is given by ^n'TTHESETTX- (L8) P l P' ^ p2. the finite set of exceptional primes being excluded from the product. It follows easily from Theorem 1.8 that the infinite product (1.8) is convergent for Re s > 3/2. Note that if we put s = 1 in (1.8), then the product formally becomes the reciprocal of y[ N(p)/p. The second form of the conjecture is stated in terms of the L-series.
18 I: OVERVIEW Conjecture 1.10 (Birch and Swinnerton-Dyer). For any elliptic curve over Q, the L-series L(s) extends to be entire for s £ C, and the order of vanishing of L(s) at s = 1 equals the rank r. A third, more detailed, form of the Birch and Swinnerton-Dyer Conjecture includes factors in L(s) for the exceptional primes and addresses the value of the coefficient of the first nonzero term in the power series for L(s) about s = 1. We shall not give a formal statement of this third version of the conjecture. Conjectures about L(l) do not even make sense without the analytic continuation in Conjecture 1.10. This analytic continuation is far from trivial and is itself the subject of the following two conjectures. The precise statements of these conjectures require some terms that will be defined later in the book. Conjecture 1.11 (Hasse-Weil). L(s), modified by elementary factors to a function A(s), extends to be entire, and the extension satisfies the functional equation A(2 — s) = ±A(s). If N is the conductor of the elliptic curve and m is any integer prime to N, then the same kind of conclusion is true of L($) when L(s) is twisted by a Dirichlet character modulo m. Conjecture 1.12 (Taniyama-Weil). Every elliptic curve over Q is modular in the sense of being parametrized by modular functions in a specific way. It turns out that Conjecture 1.12 is decidable for any particular curve. Moreover, Conjecture 1.12 implies Conjecture 1.11, and the converse implication is valid under a minor technical assumption on the curve. The depth of these conjectures is illustrated by the following theorem. Theorem 1.13 (Frey-Serre-Ribet), The Taniyama-Weil Conjecture implies Fermat's Last Theorem.
CHAPTER II CURVES IN PROJECTIVE SPACE 1. Projective Space Let k be a field. The projective plane P2(k) over k is defined as the quotient of {(x}y}w) £ k3 - {(0,0,0)}} by a relation ~, where (x', y', t//) ~ (x, y, if) if (x;, y\ w') = A(x, y, w) for some A 6 kx. When there is a need to be careful, we shall write [(x,y, w)] for the member of P2(k) corresponding to (x,y, tz;) in it3. But usually there will not be such a need, and we shall simply refer to (x, y, w) as a member of P2(k). A line defined over k is a nonzero polynomial L = ax -f by -f cw in i[x, j/, tu]. We regard L and a line V — a'x + b'y + c'w to be the same line if (a'yb',cf) is a multiple of (a,6,c). The locus L(k) - {(x, y, if) | ax -f by + cw = 0} is well defined in i^M and is called the set of k points or k rational points of L. It will be handy to refer to L(k) as a line in /^(^O- V L is defined over k and K is an extension field of ky it is meaningful to speak of L(K) as a subset of P2(A')« The affine plane k2 = {(x,y)} has a standard one-one imbedding into P2(k). Namely we map (x,y) into [(x,y, 1)]. The set that is missed by the image is the set where w — 0, which is the set of fc rational points of a line called the line at infinity. The points with w = 0 are called the points at infinity. Except for the line at infinity, lines in P2W correspond under restriction exactly to lines in k2. In certain ways the geometry of P2W is simpler than the geometry ofJt2: (1) Two distinct lines in P2(k) intersect in a unique point. In fact, we set up the system of equations 19
20 II: CURVES IN PROJECTIVE SPACE Since the lines are distinct, the coefficient matrix has rank 2. Thus the kernel has dimension 1, and there is just one point in the intersection. (2) Two distinct points in /^(fc) lie on a unique line. In fact, we set up the system of equations (*' y> 2) N = (2) and argue in similar fashion. Let <$ be in G£(3, k). Then $ maps the set k3 of all column vectors in one-one fashion onto k3 and passes to a one-one map of P2W onto P2(k) called the projective transformation corresponding to $. Two ^s give the same map of P2(k) if and only if they are multiples of one another. The group action of GL(3}k) on Pi{k) is transitive because <7L(3,fc) acts transitively on Jb3 - {(0,0,0)}. If L is the line whose coefficients are given by the row vector (a b c) and if $ is a projective transformation, then the row vector (a b c)$~l defines a new line L*, and the k rational points of L* are given by £*(*) = *(L(*)). In fact, let and satisfies be in L(k). Then (a 6 hence it is in L*(k). Conversely if is in $(L(k)) = 0; is in L*(fc), then x y l = $- satisfies (a 6 = (0 6 c)*"*1 = 0, and thus is 3> of something in L(k).
1. PROJECTIVE SPACE 21 About any point in P2(k) we can introduce various systems of affine local coordinates. Namely fix [(xq, yo,w0)] in /^(fc)- Choose (by transitivity) some $ in GL(3,k) with <£(x0,yo»wo) = (0,0,1). Then we can define local coordinates on ^~1(ifc x k x {1}) to A;2 by the one-one map V?(<^-1(x,y,l)) = (x,2/). This definition generalizes the standard imbedding of the affine plane k2 in P2(k) earlier; that imbedding was the case $ = /. Example 1. Suppose (xo,yo»^o) = (#o,2/o,l)- We can choose $ = 1 0 -x0\ 0 1 -2/o . Then ,0 0 1 / /x\ /l 0 -xQ\ fx\ /x-x0\ In this case, the local coordinates are defined on fc-^Jfe x*xl) = fcxA:xl and are given by = (p($~l{x - x0iy - y0ll)) = (a?-xo,y-3/o). This $ is handy for reducing behavior about (xo,j/0i 1) in ^M^) to behavior about (0,0) in k2. Example 2. Suppose (x0,yo,wo) = (0,1,0). We can choose $ = '0 0 1\ 10 0. Then ,0 1 0/ /x\ /0 0 1 $1 = 1 0 0 \wj \0 1 0 and V?(r, l,w) = ^(*~1(*(*> l,u/))) = y?(<I>""1 (u;,x, 1)) = (u/,x). This $ will be handy for studying behavior near the (unique) point at infinity on a cubic in Weierstrass form.
22 II: CURVES IN PROJECTIVE SPACE Affine local coordinates are useful in studying homogeneous polynomials in three variables. We say that a nonzero F 6 k[x} y, w] is homogeneous of degree d if every monomial in F has total degree d, and we write k[xy y, w]d for the set of such polynomials. Each such F satisfies F(\x} Ay,Xw) = \dF{xyyy w) for x,yyw,\ek. (2.1) Conversely, homogeneous polynomials over an infinite field can be detected by this property, according to Proposition 2.2 below. Lemma 2.1. If Jfc is an infinite field, then a nonzero polynomial / £ k[x\,..., xn] is nonzero at some point. Proof. We induct on n, the result for n — 1 being well known. Assume the result for n — 1, and suppose /(ci,... ,c„) = 0 for all (ci,..., cn). By induction, f(x\,..., xn_i, c) is the 0 polynomial in n — 1 variables, (2.2) for each choice of c. Fix a monomial x*1 • •■£n-'V in n — 1 variables. The monomials in n variables containing this (n — Invariable monomial contribute £*j*il''-*nl~i,*£ to /, and (2.2) says that ^j>o^cJ ~ ^ ^or a'* c* Therefore all the bj are 0. Repeating this argument for all monomials in n — 1 variables, we see that / is the 0 polynomial. Proposition 2.2. If k is an infinite field, then a nonzero polynomial F € k[x} yyw] satisfying (2.1) is homogeneous of degree d. Proof. Write F as the sum of homogeneous terms of different degrees: F = £?_! Fj with Fj of degree dj and with dx = rf. Then (2.1) gives \dF(xo}y0,wo) =\dFl(x0,yo}w0) + Xd2F2{x0}yo}wo) + -" + Ad*Fn(z0,ifo,u>o) for all A. Since fc is infinite, this equality for all A implies F(xo}yo, u>o) = Fi(xoyyoywo) and also Fj(x0,yo,wo) = 0 for all j > 2. Letting (ar0, yo, w0) varv an(l applying Lemma 2.1 to F — F\ and to Fj for j > 2, we obtain the conclusion of the proposition.
1. PROJECTIVE SPACE 23 A homogeneous polynomial F ^ 0 of degree > 0 is not a function on P2(k). Nevertheless we can examine the behavior of F near a point (x0,yo,u>o) in k3 - {(0,0,0)} by choosing $ in GL(3,fc) with $(x0,t/o>wo) = (0r0,1) and defining f(x,V) = F(*-l(x,y,l)). EXAMPLE 1. Suppose (x0,y0, u>o) = (*o>yo, 1) and /l 0 -x0\ * = 0 1 -2/o • \0 0 1 J Then f(x}y) = F(x + x0,yH-yo,l). For F(x, y, w) = x2y + xyw + w3, the corresponding f(xyy) splits into homogeneous terms as f(x, y) =(xly0 + x0yo + 1) + (x^jy + 2x0yox 4- x0y + y0x) + (yox2 + 2x0xy + xy) -f (x2y). We shall use this splitting in §2 in examining singularities and tangent lines. /0 0 1\ Example 2. Suppose (x0,yo,u>o) - (°> *»0) and $ = I 1 0 0 1. Then *"1 l " l X /(x,y) = F(2/.l.*)- For the same F as in Example 1, namely F(x} y, w) = x2y + xyu; + w3} we obtain /(z>y) = (j/2 + *2/) + (*3) with no terms of total order 0 or 1.
24 II: CURVES IN PROJECTIVE SPACE Conversely if d and $ are given and if / in ib[x,y] has degree d, we can reconstruct F. In Example 2 above with d = 3 and with / as above, define G(x, y, w) by inserting powers of w to make all terms of degree 3: G(xyyy w) = y2w -f xyw + x3. Then put F = G o <£ and recover F(x, y, w) = x2y + zyw + w3. In the special cases that k is R or C, /^(R) and P2(C) are smooth manifolds. It is helpful as motivation to recall the argument for /^(R)- To do so, we need to give compatible charts (U, <p)y where <p : U —► R2 is a homeomorphism onto an open set and where the sets U cover /^(R)- At points (x, y,u/) on the set C/3 = {uj ^ 0}, we can use <ps(x,y,w) = (—, —) • Viy it;/ This is just our affine local coordinate system about (xo,yo,u>o) = (0,0,1) with $ = 1. On the set U2 = {y / 0}, we can use /x w <p2(x,yyw) - I -,— \y y This corresponds to using (xo,yo,wo) = (0,1,0) and <£ = I 0 0 1 I . \o 1 0/ Similarly we can define a chart (/7i,^i). To have a manifold, we are to check that functions like <P2°<p31 are smooth. In fact, these functions are rational with nonvanishing denominators. For example, ¥>20<pal(xty) = <p2(x,y, 1) = with domain {(ar,y) |y i=- 0}. More generally, all our systems of affine local coordinates (defined via (xo, yo, wq) and $) yield compatible charts. Once iV) and /^(C) are smooth manifolds, partial derivatives are available. We shall see that partial derivatives provide a useful tool even when the field k is not R or C. 2. Curves and Tangents A plane curve or projective plane curve defined over k is simply a nonzero homogeneous polynomial F 6 k[xyyyw\d for some d > 0, except that we regard two such F's as the same curve if they are multiples • \y yJ
2. CURVES AND TANGENTS 25 of each other. As we remarked in §1, F is not well defined on P^k). Nevertheless, the zero locus of F yields a set in projective space. Namely, if K is an extension field of k, then the locus F(K) = {(xyy,w)\F{x)y,w) = 0} is well defined in /^(^O since F is homogeneous. This locus is called the set of K points or K rational points of the curve. In the special cases that d =. 1, 2, or 3, the curve is called a line, conic, or cubic, respectively. If F is a plane curve and if we have an affine coordinate system given by $ with 3>(xo,2/oj w°) = (0>0, *)> tnen tne corresponding affine curve is f(x,y)=F(*-l(x,yfl)) in k[xy y]. Among the k rational points of a curve, we shall distinguish between singular points and nonsingular points. Thus let F ^ 0 be in k[xyyy w]dy fix (xo,i/0)^o) € F(k), and choose affine local coordinates about (£o,t/o,u>o) given by some $ with $(xo,yo>w0) = (0,0,1). Let f{z,y) = F{*-l{z,yfl))ek[xfy]. For example, the situation in Example 1 of §1 had wo = 1 and F(xy y, w) = x2y -f xyw + w3, and we were led to /(x, y) = (xjj/o + xoyo + 1) + (zjjy + 2xoyo^ + ^02/ + 2/o«) 4- (y0z2 + 2x0xy + xy) + (x2z/) = /o(*,y) + /i(*,y) -I- /2(*,y) + M*,y)> The constant term /0 is 0 since (x0,yo, 1) is in F(&). In this example and in general, / is the sum of homogeneous terms of degree 1 through rf, say / = fi + • • • + fd with /lf..., fd depending on (x0, y0, u>0) and $. We say (x0,yo> u>o) is a nonsingular point if /i is not the 0 polynomial in k[x}y]. Otherwise (xo,yo,wo) is a singular point. In our example with tuo = 1) we are to consider /i(x, y) = (2x0y0 + yo)x + (xg -I- x0)y. The singular points in F(k) are those where the coefficients of x and y are both 0. The coefficients are 0 when (xo,t/o) = (0,0) and (x0yyo) =
26 II: CURVES IN PROJECTIVE SPACE (-1,0). (Back in P?{k), these are (0,0,1) and (-1,0,1).) But neither of these is in F(k). Hence F is nonsingular at every point of F(k) for which u/o = 1. We need to check that nonsingularity does not depend on the choice of $. Thus suppose also that ^(x0,yo,u>o) = (0,0,1). Then V o fc-1 = lc d 0 J with [a j invertible. Write /(x,y) = F(<P~l(x,y,1)) and g(x,y) = F(tf _1(x,y, 1)). We write heuristically f(x}y) = (Fo*-l)(*o*-l)(x}yy\) = (Fo ^_1)(ax -f 6y, ex -f dy^rx + sy + \) K 'V * '\rx + sy+ 1 *rx + sy+l} // = (rx + sy + 1) g[ r, r ) X,*, . / ax + by ex -f dy \ = (rx + 5y 4- 1 rf yi + • • ■ + 0* (——LJrT' —7 rr) V rx -f sy -f 1 rx + sy + 1 / = (rx -f sy + \)d~1g\{ax + 6y,cx + dy) + ■ • • + 0<f(a* + 6y,cx + dy). (2.3) This computation is valid in the quotient field fc(x,y). By expanding the various powers of (rx -f sy -f- 1) and regrouping by homogeneity in (x,y), we can read off /i(*>y) = yi(ax + 6y,cx + dy). (2.4) Similarly 0i(*,y) = /i(a* + /fy,7* + *y) with ( " <*) = (c d) So /i and yi are both zero or both nonzero. Proposition 2.3. Suppose F £ k[xtytw]m and G £ £[x,y,w]n are plane curves. If (xo,yo,u>o) is in F(k) C\G(k)) then (x0,yo,^o) is a singular point of FG. Proof. Choose affine local coordinates with 4>(xo,yo,u>o) = (0,0,1), and define /(x,y) = F(*-l(z,y% 1)) and y(x,y) = G(Q-l(x,y, 1)). Then we can write / = /i + f- /m and y = yi + f- yn- Since f{x,y)g(x,y) = {FG)(*-l(x,y,\)), and since /y has no first-degree terms, it follows that FG has a singular point at (x0,yo,u>o).
2. CURVES AND TANGENTS 27 We say that the plane curve F over k is nonsingular (or smooth) if F is nonsingular at every point of F(k)} where k is the algebraic closure of k. Otherwise we say F is singular. Nonsingularity at all points of F(k) does not imply F is nonsingular. For example, the curve F(x, y, w) = x3 - 6xw2 + 6yw2 - y6 is defined over k = Q, is nonsingular at every point of F(Q), and has a singular point at (x,y, w) = (\/2, \/2,1). So the curve is singular. Nonsingularity of the curve F implies irreducibility over the algebraic closure k> according to Corollary 2.5 below. The corollary depends on Bezout's Theorem, whose proof is deferred to §5. Theorem 2.4 (Bezout's Theorem). Suppose F 6 k[x^y^w\m and G 6 k[x}y}w]n are plane curves. Then F(k) n G{k) is nonempty. If it has more than mn points, then F and G have as a common factor some homogeneous polynomial of degree > 0. Corollary 2.5. Suppose F is a plane curve and is reducible over k. Then the factors are homogeneous polynomials, and F is singular. Proof. Write F — F\F2 nontrivially. Let d\ and ei be the highest and lowest degrees of terms in F\f and let d2 and e2 be the highest and lowest degrees of terms in F2. The product of the terms of degree d\ in F\ and the terms of degree d2 in F2 is nonzero and is the d\d2 degree part of F. The product of the terms of degree t\ in F\ and the terms of degree e2 in F2 is nonzero and is the exe2 degree part of F. Since F is homogeneous, d\d2 = eie2. It follows that d\ = e\ and d2 = e2\ thus F\ and F2 are homogeneous. Bezout's Theorem says that F\{k) H F2(k) is nonempty, and Proposition 2.3 says that any point in this intersection is a singular point for F. Hence F is singular. Suppose that (£o,2/o>^o) *s a nonsingular point of F(k) and that *(soi2/OiU>o) = (0,0,1). We have written f{x,y) = F(4>_1(x,y, 1)) with / = /i H h fd- Then f\(x} y) = px + qy with p and q in k} not both 0. Consider f\ as a polynomial in three variables fi(x}yyw) independent of w. We lift back to a nonzero member L — f\ o <I> of k[xyy,w]ly and the result is called the tangent line to F at (xo,yo,^o)- Let us see that the tangent line is independent of the choice of aflTuie local coordinates. Thus suppose also that ^(xo, 2/0, wo) = (0, 0,1). Form fa b 0\ g{x,y) = F(V-l(x,y,l)) and ^ o ^_1 = c d 0 , and let the
28 II: CURVES IN PROJECTIVE SPACE respective tangent lines be L* and Ly. Then we have M*> V>w) = h W*> y> ^))i (2.5) /l^.y^) = /i(*,2/) = gi(ax + byycx + dy) = gi(ax + by, ex + dy,rx + sy + w) = 0x(¥ o fc"1^, y, w)), L*(z,y,w) = £iW*,y,u>)) = 5i(* o *_1($(x, y, w))) = A (<*>(*, y, ™))- (2.6) Comparing (2.5) and (2.6), we see that L^(xyyyw) = Ly(x,yyw). If F is a plane curve and $ is a projective transformation, then F* = Fo$_1 is another plane curve, and the k rational points of F* are given by F*(k) = *(F(k)), by the same kind of reasoning as in §1 for the case that F is a line. This reasoning shows also that if (#o, !/o>wo) is a nonsingular point for F, then $(xo> Vo^o) is a nonsingular point for F*. From calculus we expect that nonsingularity can be decided in terms of partial derivatives when k is R or C. In fact, the argument that works for R or C works also for general k. All that is needed is a definition of partial derivatives, along with a few elementary properties. The definition of a partial derivative of a polynomial is clear, and it is routine to check that the product rule and the several-variable Chain Rule are valid in the context of polynomials. We find, at a nonsingular point, that the tangent line is given by a familiar formula from calculus, as in the following proposition. Proposition 2.6. Let F be a plane curve over k. If (xo, yo, ™o) is on the curve, then (xo,yo>u>o) is a nonsingular point if and only if at least one of ^, ^£, |~ is nonzero at (xo,J/o»^o)- In this case the tangent line L to F at (xo,yo>wo) IS glven by rflFi rflFi rdJFi L OX J(x0,yo,t"o) L #2/ J(*o,yo,u>o) LC7iyJ(ro,yo,tuo) Proof. Choose affine local coordinates with $(xo>2/o> wo) = (0,0,1). Since (xoiIfthWo) *s on tne curve, Fo$"1(x,y,i^) vanishes upon substitution of (0,0,1). Hence every monomial in F o $""1 contains a factor of x or y, and it follows that dn-{F o *"1)(0I 0,1) = 0 for every n. (2.7) dw]
2. CURVES ANP TANGENTS 29 We write f(xyy) = F($-l(xyyyl)) with / = /i + • • • +/<*. The coefficients of the linear term /i are f£(0,0) and f£(0,0). Since / = For1o((^)-(x,y)l))) the Chain Rule gives \dx dyj(00) \dx dy «W („,,„,„.) \Q Qj Let (x'\yf\w') be given. Multiplying on the right by ( , j, we obtain = («£ ££ «n ♦-! ir V«* 3y ^w(royoU,o) \^0y (29) By (2.7) for n = 1, the third entry of V5x dy 0w)(xoyoWo) is 0. Thus the right side of (2.9) is V&* 0y dw J{*o,*o.wo) \w'J Putting (x',y',t//) = $(x,2/,uj), we obtain L{xyyyw) = /! o$(x,y,w) - (<^1 ^ ^^ x y dx dy dwj(xoyowo)yw)' (21Q) By (2.8), we have a singular point if all first partials of F are 0. Otherwise (2.10) shows that fx is not 0; hence f\ is not 0 and the point is nonsingular. In this case, (2.10) gives the tangent line.
30 II: CURVES IN PROJECTIVE SPACE The quadratic term /20$(r, y}w) can be identified similarly, provided the characteristic of k is not 2. But the expression is more complicated and depends on the choice of $. It involves also the Hessian matrix of F, defined as H - H(x0%yo}w0) = dx2 d2F dxdy d2F d2F d2F \ dxdw 82F dxdy d2F dy2 d2F dydw d2F (2.11) \ dxdw dydw dw2 ) (x0,y0,u>o) Proposition 2.7. Suppose the characteristic of k is not 2. Let F be a plane curve over k of degree d} let (xo,yo>u>o) be a nonsingular point on the curve, and choose affine local coordinates with $(£(), yo> u>o) = (0,0,1). Put /(x,y) = For1!!,!/,!), let h(xyy) be the quadratic term in the expansion of f(x}y) about 0, and let /^(x, J/, w) = /2(x,y)- Then Q*(x, t/, w) = f'2 o <£(x, y} w) is given by 2<2*(x,y,u;) = (x y w ) H j y \ - 2(d - l)L'q>(xyyyw)L(x1yiw), (2.12) where H is the Hessian matrix of F at (x0, t/o, wo)> Z(x, y, w) is the tangent line at (x0,t/o,u>o), and L^(x, j/,u;) is a line with Z4(xo,yo> u>o) = 1. Proof. Let #* be the Hessian matrix of F* =z Fo^"1 at (0,0,1). We first show that In fact, the Chain Rule gives / dF* dF* dF* \ V dx dy dw Transposing gives / dF* \ dx dF* dy dF* ^ dw ' {xty,w) \ _ (dF_ dF_ J(x,ytw)~ \ dx QV 9F dw (2.13) / «I>- l(xtytw) = *-1' dx dF_ dy dF_ dw * <Jf-l(xtytw) \ f)<,n I
2. CURVES AND TANGENTS 31 Now we can apply the Chain Rule to each row of this equation and evaluate at (xyy}w) = (0,0,1). When the results are assembled, we obtain (2.13). The matrix of second partials of f(x} y)=Fo <I>~l(x, t/, 1) at (0, 0) is obtained by taking the upper left 2-by-2 block of //* and putting w = 1. Thus 2/2(*',t/) = (*' </)[ 21 dS dxdy dyi J (0,0) = (*' x/ 0)H (?) = («' y' w')H -2u/(0 0 \)H — w /2 <92f* dw2 (0,0,1) The last term on the right is 0, by (2.7) for n = 2. Putting I y' = $ I t/ I , we obtain V w I 2Q*(x,y,u;) = (x y w)// -2u/(0 0 l)//4 = ( x y iz;) // -2u/ <92F*1 [dxdw\ (001) x + <92F*1 dydw (o.o.i) (2.14) With d as the degree of F, the only monomial of F* that makes a contribution to times what ^~ d*F*] dxdwl(o,oti) is xwd"ly and the contribution is d - 1 J(0 gives. A similar remark applies to d dw
32 II: CURVES IN PROJECTIVE SPACE Thus the second term on the right of (2.14) is =_2(J_1K(f^l ,+r«£i ,) \i dx J(0,0,1) i °y J(o,o,D ) = -2(d-lK/i(i',y') = -2(rf- \)w'h{x',y',w') = -2(</- lK/i(4(*,y,i»)) = -2(d-l)w'L(x,y,w). The value of w' is the third entry of $ I y J , and this is a nonzero linear expression in x,yyw. We define it to be !/$(£, y,iu). 3. Flexes In this section we shall define the intersection multiplicity of a line and a plane curve at a point. The curve may be nonsingular or singular at the point. The notation for intersection multiplicity will be i(P, L, F), where P = (xo,yo> u>o) IS m F(k) fl L(k)y F is in k[x>yyw]d> and L is in *[*>!/» Hi- We introduce affine local coordinates. Choose $ with 4>(zo,yo,^o) = (0,0,1) and put /(*, y) = F^-1^, y, 1)) = /i (*, y) + • ■ ■ + /<*(*, y) /(x,y)=L(*-1(xly,l)). Since /(0,0) = 0, l(x>y) = bx — ay for some constants a and 6 not both 0. Then <p(t) = f J parametrizes /(x,y) = 0. The expression f(<p(t)) is a polynomial in t with f(<p(0)) = 0. In fact, /MO) = /iK W) + /2« 60 + • ■ ■ + /d« 60 = tfi{a,b) + t2f2(atb)+--+tdfd{a,b). There are two possibilities. If foip is not the 0 polynomial, then f(<p{t)) has a zero of some finite order at t = 0, and this order is defined to be the intersection multiplicity i(P, Z/, F). If / o tp is the 0 polynomial, we say that t(P, L, F) = +oo. It will be convenient to define z(P, L, F) = 0 if Pis not in L(k)CiF(k).
3. FLEXES 33 Wc need to check that i(P, L, F) does not depend on the choice of $. Thus suppose that \P(x0,yo»wo) = (0,0,1). Write /<* /J 0\ tfotf"1 =7 6 0 \r s 1/ f'(x,y) = F(*-Hx,yA)) = f'1{x,y)+-+f'd{x,y) i'(x,y) = LCP-'Oc.y, i)) = *'* - «'y (2.15) From (2.4), we have KX>V) = l'{<x2 + (3yf7Z + 6y). Since i(xty) = bx — ay> (2.15) gives 6 = b'a - ay and - a = b'(5 - a!6. Putting A = a8 — 0y, we thus have (:)-(A?)G) - (i)—(;;)(:)■ Now (2.3) gives So /(V>(0) = (aW +6rt -I- l)*-lAf[(a't,b't) + (art + bst + l^AVjfa'M'O + • • • + Adf'd(a'tyb't). From this equation we see that i(P, L,F) is the same when computed from / as when computed from /'. Proposition 2.8. If a line L and a plane curve F meet at a point P, then i(P,L, F) = +oo if and only if L divides F.
34 II: CURVES IN PROJECTIVE SPACE PROOF. If L divides F, then l{x,y) divides f(x,y). Since /(<£>(*)) is the 0 polynomial, so is f((p(t)). Conversely suppose f(<p(t)) is the 0 polynomial, so that /r(a,6) = 0 for all r with 1 < r < d. Without loss of generality, suppose 6^0. The equality 0 = /r(a, 6) = c0ar + Clar"lb + • • • + crbr =^(^(5)r+ci(ir1+---+^) says that X — — is a factor of 6r(coA'r -f c\Xr~x -\ \-cr). If we write 6 br(c0Xr + --+cr)= (*-£)«(*), then brfr{x,y) = yTbr(c0(^j + -+cr) ='(;-iH;)=»-"<-"(^(;))- Hence l(x,y) divides /r(z>t/). It follows that l(x,y) divides f{x,y) and then that L divides F. As was the case with the tangent line, there is an expression for intersection multiplicity that avoids affine local coordinates. For current purposes this expression, which we give as Proposition 2.9, is not too helpful, because it actually represents a disguised use of a particular system of affine local coordinates that may not be the most convenient one for applications. We shall not use this proposition until Chapter V. Proposition 2.9. Suppose F £ fcfz,*/,!/;]™ is a plane curve, L 6 k[xyyyw]i is a line, and P = (x0jyoyWo) is a point on L. If (x' 1yfiw/) is any point on L with \{x\y\w')\ ^ [(zo>2/OiWo)]i then i(P,L,F) equals the order of vanishing at t = 0 of i/>(t) = F(x0 + tx\ y0 + ty\ w0 + tw'). Proof. Let (x",t/",u/') be a point with L(x",y",w") = 1, and choose affine local coordinates $ so that
3. FLEXES 35 This is possible since the above three image vectors for <£ l are linearly independent. The corresponding local expression for L is then simply /(x,y) = Lo$ [ y j = x. The usual parametrization of/(x,y) = 0 comes from <p(t) = J. Thus we can compute the local expression for F as f(<p(t)) = Fo<&-1 j t J = F(x0 + tx',yo + ty\w0+tw'). Thus ip(t) = f{(f(t))} and the proposition follows. Suppose that P is in L(k) 0 F(k), that P is a nonsingular point of F, and that Lt is the tangent line to F at P. In the above notation, the local expression for Lt is h(x,y) = f\(x,y). Thus i(P,LlF)=l«=>/1(a,6)^0 «=>(a,6)tf LT(*) <=> image ^ £ ^r(^) L(Jfc) £ Lr(k) L is not the same as Lt- (2.16) Normally we should expect the tangent line at P to have intersection multiplicity 2 with F. We say that a nonsingular point P of F is a flex or inflection point of F(k) if the intersection multiplicity of the tangent line to F at P is > 3. Let us see how we would identify a flex from the definition. Suppose F is given and we want to check (xo,2/o>l)- As usual, we can take /l 0 -x0\ $ = j 0 1 —yo J , and then /(x, y) = P^'^x.y, 1)) is simply given \0 0 1 / by f(x,y) = F(x-f x0,y + 2/o, !)■ ^e rewrite / as a sum of homogeneous terms in x and y. Then (xo.yo, 1) is on the curve exactly if the constant
36 II: CURVES IN PROJECTIVE SPACE term is 0, the point is nonsingular if also the first order term fi(x,y) is not 0, and the point is a flex if also the second order term /2(x,y) satisfies /2(a,6) = 0, where a and b are defined by /i(x,y) = bx — ay. Proposition 2.8 may suggest that the condition on /2 is equivalent with the condition that /i(x,y) divide /2(^,y). In fact, we can see this equivalence algebraically. [If f2{x,y) = ex2 + dxy -f ey2, suppose ex2 + dxy + ey2 = (bx — ay)(rx + sy). Then c = 6r, d = —ar -f 6s, e = —as; hence ca2 -f- t/a6 -+■ e62 = 0. Conversely if ca2 + dab -f c62=0, we define r = c/6 and s = — e/a, checking separately the cases that 6 = 0 or a = 0.] Example. Let F(x,y,w) = x2y + xyw + w3. About (x0,yo,l) we have seen that /(x,y) = (xqI/o + *oyo + 1) + ((2x0y0 + yo)* + (*o + x0)y) + (s/o*2 + (2x0 + l)xy) + x2y. Assume that the point is on the curve, i.e., that xlyo + zoyo + 1 = 0. (2.17) Then /i(^y) = (2x0yo + yo)z + (*o + xo)y fi(z,y) = {Vo)x2 + (2x0 + l)xy. If we put b = 2x0yo 4- 2/o and a = — (xfj + £o). the condition for a flex is 0 = /2(a, 6) = y0a2 + (2x0 + l)a&, i.e., y0(*o + x0)2 - (2x0 + l)(*o + *o)(2x0i/o) = 0. (2.18) We can check that (2.18) is equivalent with the condition that /i(x,y) divide h{x>y). In the above example, one way to find all flexes (£o,yoil) would be to solve (2.17) and (2.18) simultaneously, discarding any singular points at the end. However, there is a more elegant way that does not involve passing to affine local coordinates; the only additional condition is that the field is not to have characteristic 2. We require two lemmas.
3. FLEXES 37 i Lemma 2.10 (Principal Axis Theorem). If char k / 2 and if A is a symmetric square matrix over ky then there exists a nonsingular square matrix # over it such that *trj4* is diagonal. Proof. We induct on the size, the case of size 1-by-l being trivial. Suppose the result is known for size (n — l)-by-(n — 1). If A has size n-by-n, write A = [ , tr , J with a symmetric of size (n — l)-by-(n — 1) and with d of size 1-by-l. If d ^ 0, let x = -d~lb. Then (o l) V6^ dJ Utr lj = \0 d)' and the result is reduced to the (n — l)-by-(n — 1) case and follows by induction. If d = 0, we may assume inductively that 6^0, say 6, ^ 0. Let y be an (n — l)-dimensional row vector with ith entry 6 and with other entries 0. Then // 0\f a b\fl y"\ /* * \ \y lj\btr 0j\0 l)~\* yaytv + btfytr + yb) = \* 62aii^2dbi) ' Since 2bi ^ 0, there is some value of 6 for which 62aa + 2<56» ^ 0. Thus we are reduced to the case d ^ 0, which we have already handled. Lemma 2.11. Let A = (Aij) be a nonzero 3-by-3 symmetric matrix over a field k with char k ^ 2. Then the conic C(xyy}w) = (x y w)A I y J associated to A is reducible over the algebraic closure k if and only if deti4 = 0. Proof. If C(x,y, tu) is reducible, Corollary 2.5 shows that the curve has a singular point (xo,yo,u>o). By Proposition 2.6 dc dc dc n f , . ax ay aw wo Hence detyl = 0.
38 II: CURVES IN PROJECTIVE SPACE Conversely let det A = 0. By Lemma 2.10, choose a nonsingular $ such that A1 = $_1 'A$~l is diagonal. Without loss of generality we lx may assume that A1^ = 0. Put I yf J = $ I y I . Then w C(x}y,w) = (x' y' w')A'\ ^ J = A\xxa + A'22y'2. The right side is reducible over k. Substituting back in terms of (x, y> w)y we see that C(x,y>w) is reducible over k. Proposition 2.12. Suppose the characteristic of k is not 2. Let F be a plane curve over k of degree d > 2, and let (zo, 2/o, u>o) be a nonsingular point on the curve. Then (xo,2/o, u>o) is a flex if and only if the Hessian matrix of F satisfies det H(xo,yo^o) = 0. Proof. Without loss of generality, we may suppose k is algebraically closed. Let L be the tangent line at (xo>2/o>u>o). Choose affine local coordinates with $(xo,2/o,wo) = (0A 1)> Put f{x^y) = ^(^'H*^ 1))> and let f\(xyy) and /2(x,y) be the first-order and second-order parts. The condition that (x0}y0}wo) be a flex is that fi(xyy) divide f2(xyy)} hence that L divide the conic Q<p of Proposition 2.7. Suppose that (xo,2/o>wo) *s a ^ex- Then L divides Q$, and (2.12) shows that L divides the conic defined by the matrix H(xo,yoywo). By Lemma 2.11, detH(x0}yo}wo) = 0. Conversely suppose det H(xo> yo>wo) = 0. By Lemma 2.11 the conic C(xyyyw) defined by the matrix H(x0}yQ}w0) is reducible. Let us say that C = LiL2, where L\ and L2 are lines. We now divide matters into cases. Suppose d > 2. In affine local coordinates defined by $, Q* becomes homogeneous quadratic in (x}y)} and the term -2(d- l)L'*(x,y,w)L(z,y9w) becomes —2(d — l)/i(x, y) + higher order. Therefore C has 2(c/ — \)L as tangent at (x0,yo,wo). But C = ^1^2 implies C has tangent Li or L2 at (ar0,2/o, ^0), up to a constant factor. Hence L is a multiple of L\ or L2, an^ £ divides C. By (2.12), L divides Q*. Thus (xq,yo,^o) is a flex.
3. FLEXES 39 Now suppose d = 2. We still have C = LiL2. The second partials of C are twice the entries of H(x0yyQyw0). Since C and F are both conies, C = 2F. Then F = \L\L2, and easy computation shows that the nonsingular points are the points of F(k) not in L\(k) H L2(k) and that each such is a flex. EXAMPLE. Let F(x}yyw) = x2y -f xyw -f w3. The Hessian matrix at (x0lr/o,l) Js (2y0 2x0 + 1 2/o \ 2x0 + l 0 x0 . yo *o 6 / The condition in Proposition 2.11 that the determinant vanish says that 2y0(x20 + x0) - 6(2xo + l)2 = 0. (2.19) This is ostensibly different from (2.18). But use of (2.17) reduces both equations consistently to 3(2x0 + 1)2 = -1. Corollary 2.13. Over an algebraically closed field of characteristic not equal to 2, a nonsingular conic has no flex, while a nonsingular plane curve of degree > 2 has at least one flex. Proof. Let the curve be F. Since F is nonsingular, Proposition 2.12 says that the flexes are all the common solutions of F(xyyyw) = Q and det H(xyyfw) = 0. If deg F > 2, Bezout's Theorem says there is a solution. If deg F = 2, then H(xyyi w) is a constant matrix H, and we have to show that det H is not 0. Let C(xyyyw) be the conic defined by 11. At the end of the proof of Proposition 2.12, wesaw that C = 2F. If det// = 0, Lemma 2.11 says that C is reducible, and thus F is reducible, in contradiction to Corollary 2.5. If F has degree d > 2, then det // has degree Z{d — 2). The converse half of Bezout's Theorem says that an irreducible F of degree d (over a field of characteristic not 2) has < Zd(d — 2) flexes unless F and detH have a common factor. One can show that a plane curve F and its det H have no common factor unless F is a product of lines; in particular, F and det H have no common factor if F is irreducible.
40 II: CURVES IN PROJECTIVE SPACE 4. Application to Cubics The most general cubic is F(r, y, w) =cyyyy3 + cxyyxy2 + cxxyx2y + cyywy2w + cxywxyw + CywwVW2 + CXXXX3 + C^^X2^ + Cxti/wX^2 + C^^W3. (2.20) We examine the algebraic effect of some geometric conditions that we might impose on such a curve. We shall make calculations in local coordinates, despite the availability of Proposition 2.6, because we do not want to exclude k of characteristic 2 when we deal with flexes. Condition 1. The curve is to pass through (r,y, w) = (0,1,0). Equivalently cyyy = ^* Condition 2. In addition, the curve is to have (x,y,w) = (0,1,0) as a nonsingular point. To examine this condition, we choose $ = /0 0 1\ II 0 0 J as the map that carries (xo}y0}w0) = (0,1,0) to (0,0,1). \0 1 0/ Then f(z,y) = F(*-1(x,y,l)) = F(y,l,x) 2 2 Cxxyy i CyywX 4" CxywXy *t Cytyu/2? + crrr2/3 + cxxwy2 + cxww y* + (2.21) Hence The condition is that CyylU ^0 Or CXyy ^ 0. Condition 3. In addition, the curve is to have L(x}y,w) = w as tangent line at (0,1,0). (Recall that this line is the line at infinity.) Referring to (2.21), we have h{x,y,w) = cyyxvx + cxyyy.
4. APPLICATION TO CUBICS 41 Hence L(x, y, w) = /i($(x, y, w)) = /i(u>, z, y) = cyyww + cxyyx. The condition is that In view of Condition 2, we automatically have cyyw t 0) and then the tangent line is the same line as w. Condition 4. In addition, the curve is to have a flex at (0,1,0). Referring to (2.21), we have /2(*,2/) = cywwx2 + cxywxy + cxxyy2 and /l(x»2/) = cyywX. The condition that /i divide f^ is that cxxy = 0. The condition that i((0,1,0), w,F) be defined is that w not divide F, hence that CXXX ^F ^* When all four conditions are satisfied, the cubic becomes F(xyy,w) - cyywy2w + cxywxyw + cywwyw2 3 "* 2 3 i Cxxz^ i Cxxxv^ ^ i CXXUU)X'W ~j~ CyjiyyjW (2.22) with cyyti, ^ 0 and cxxr ^ 0. Proposition 2.14. If F is a cubic over k such that F(k) has a flex (#o> 2/o>u>o)) then there exists a projective transformation <S> such that F* is the same curve as y2w +a\xyw + asyw — x3 — a.2X2w — aixw2 — clqw3. (2.23) PROOF. Choose $! so that $i(*o,2/o, ™o) = (0,1,0). Then F*1 has a flex at (0,1,0). Let L(x}yyw) = ax + (iy be the tangent to F*1 at
42 II: CURVES IN PROJECTIVE SPACE (0,1,0), choose a nonsingular matrix I , 1 with aa -f (3c — 0, and define (a 0 b\ *2X = 0 1 0 . \c 0 d) Then $2 has the property that Z*3 is the same line as w. Hence (F*i)*2 = ^*3*i ^33 a flex at (0,1,0) and has w as tangent. The discussion above shows that F*3*1 has the form (2.22). Put It 0 0\ *J* = 0 t 0 \0 ° V with * to be specified. Then (F*2*1)*3 = F*3*2*1 has F*3*>*> (x, 2/, u;) = F*3*> (**, <y, u/) = cyyw(t2y2w) + • • • + cxrx(f3x3) + .... Taking tf = cyyw/cXXXi we see that the coefficients of t/2iu and x3 in F*3*2*1 are the same. Thus $ = <J>3<$2$i is the required transformation. A cubic of the form (2.23) is said to be in Weierstrass form. The corresponding affine Weierstrass form, achieved by putting w = 1, is written as an equation y2 + a\xy + a3y = x3 + a2x2 + a4x + ac. (2.24) The notation in (2.24) is standard; the subscripts will be seen in Chapter III to indicate the degree of homogeneity of the corresponding term under a certain change of variables. By Corollary 2.13, a nonsingular cubic over an algebraically closed field of characteristic not equal to 2 necessarily has a flex. Hence it can be put in Weierstrass form by a projective transformation. An elliptic curve over fc is a nonsingular cubic over k that is in Weierstrass form. In testing a curve (2.23) for nonsingularity, note that (0,1,0) is the only point at infinity on the curve, and it is nonsingular by Condition 2. Hence only points (xo,t/o> 1) need to be checked.
4. APPLICATION TO CUBICS 43 Proposition 2.15. Let P be a nonsingular cubic over &, and let L be a line. Then Yip {(Pi U F) is °> 1. or 3. REMARKS. (1) If P ^ Q are in F(k), let L be the line containing P and Q. Then the proposition says there is a third point on L(k) D F(k) if we count multiplicities. (This third point may coincide with P or Q.) We define PQ to be the third point. (2) To define PP} let L be the tangent to F at P. Then f(P, L, P) > 2. The proposition says that there is a third point on L{k) f) F(k) if we count multiplicities. This third point will be P if P is a flex of P, and it will be some different point if P is not a flex. In either case, we define PP to be this third point. The rule for determining PQ in this paragraph or the previous one is called the chord-tangent composition law. Proof. Let P be as in (2.20). Without loss of generality the sum is not 0. Thus take P = (xo>2/o, wo) to be in L(k)DF(k). Since everything is invariant under projective transformations, the same argument as in the proof of Proposition 2.14 shows we may assume that (zo.l/Oi^o) = (0,1,0) and that the tangent line there is w. Conditions 1 and 3 show that Cyyy = cxyy — 0. We consider some possibilities for L. (1) L = w. Putting w = 0 in P gives P(x,y,0) = cxxyx2y + cxxxx3. (2.25) (la) Suppose cxxy = cxxx = 0. Then w divides P, and Corollary 2.5 shows that P is singular, contradiction. (lb) Suppose cxxx ^- 0 and cXXy — 0. Then Condition 4 shows that (0,1,0) is a flex with z(P, Lt F) defined and > 3. Since z(P, L, F) cannot be > 3 for a cubic, i(Py L, P) = 3. The expression (2.25) cannot be 0 without x = 0, and so there are no other points in L(k) D F(k). So L(k) fl F(k) contains P with multiplicity 3, and it contains no other point. (lc) Suppose cxxy ^ 0. Then Condition 4 shows that (0,1,0) is not a flex, hence that i{P,LtF) — 2. Meanwhile (2.25) produces a unique additional point Q on L for which (2.25) is 0; we can write Q = (1,2/i, 0). To complete the study of this case, we need to show that i(QyLyF) ^ 0. / 0 0 1\ Define $ = -yi 1 0 J, so that 4>(l,yi,0) = (0,0,1). Then V 1 0 0/ f(x,y) = F(<p-l(x,y,l)) = F(l,y + yi,x) l{x,y) = L(*-l(x,y,\)) = L{\,y + yux) = x.
44 II: CURVES IN PROJECTIVE SPACE Calculating /(x,y), we find that )x + (cxxy)y. Since cxxy ^ 0, fi(x,y) is not a multiple of l(x,y). Hence L is not tangent to F and Q, and i(Q,L,F) = 1 as required. (2) L is not the same as w. Since L goes through P = (0,1,0) and is not the tangent, we have i(P, L,F) = 1. Assume that i(Q, L, F) > 1 for some Q ^ P. We may write Q = (xo,t/0)l)« Transforming by /l 0 -*0\ $ = 0 1 -ifo ), which has $(P) = P and $(Q) = (0,0,1), we \0 0 1 / may assume from the outset that Q is (0,0,1). Then L, which passes through both P and Q, is the line L = x. Using $ = /, we form /(x, y) = F(x, y, 1) = cxxyx2y + cyytut/2 + cxyu,xy + cyu;tz,y 3 2 i CXXXX ~7~ CXX\i)X ~r Cxww*' and /(x,y) = L(x,y, 1) = x. Then /(x,y) = bx — ay with a = 0 and 6=1. Putting y?(<) = f . 1, we have /(?(<)) =t(cx tun/" b) + t2(cxru;a2 + c,.yti,a6 + cyyu,62) + <3(cxrjca3 + cxrya26) = *Cyiyti, T" * Cyyw • ^Z.ZOj Condition 3 shows that cyyu, ^ 0. If cyt/;u, = 0, then (2.26) shows that i(Q,L,F) = 2 and that f(x,y) vanishes on the locus l(xyy) = 0 only at (x,y) = (0,0); hence P and Q are the only points contributing to the sum in question, and the sum is 3. If cyww ^ 0, then (2.26) shows that i(Q,L,F) = 1 and that there is just one more point R where i(R,L,F) > 0; interchanging the roles of Q and R shows that i(R> Ly F) = 1 and hence that the sum is 3. 5. Bezout's Theorem and Resultants Before returning to Bezout's Theorem, we introduce the resultant R{fyg) of two polynomials. Let A be a unique factorization domain, e.g., k[x\,..., xr]. For / and g in A[X] with /(*) = oo + a,^ + ..- + araJfm g(X) = b0 + blX+-+bnX", { " '
5. BEZOUT'S THEOREM AI(JD RESULTANTS 45 let [R(f} g)] be the (m + n)-by-(m -f n) matrix faQ ai ••• am_i am 0 0 ••• 0 \ 0 a0 ••• am_2 am-i am 0 ••• 0 0 •■• a0 ••• flm bo bi 6n_! 6n 0 ••• 0 0 b0 6n_2 bn-i bn ••• 0 V 0 • • • 60 &i • • 6n / in which there are n rows above the 6o in the first column and there are m remaining rows. Let R(fyg) be the determinant R(f,g) = det[R(f9g)]. Proposition 2.16. With / and g in A[X] of the form (2.27), the following are equivalent: (1) / and g have a common factor of degree > 0. (2) af + bg = 0 for some nonzero a and 6 in A[X] with dega < n and deg6 < m. (3) R[f,g) = 0. When R(fyg) ^ 0, there exist nonzero a and 6 in A[X] with dega < n, deg6 < m, and a(X)f{X) + b(X)g(X) = fl(/,<7)- (Here R(ftg) is to be regarded as a polynomial of degree 0.) Proof. (1) => (2). If u \ f and u \ g, write f — bu and # = —au. Then (2) holds. (2) => (1). Factor / and bg. If (1) fails, then the prime factors of/ of degree > 0 must occur in the factorization of 6, with multiplicities. So deg/ < deg6, contradiction. (2) O (3). For any a and b of the form a(X) = ao + axX + • •. + an_1Xn"1 &(;0 =/?0 + A* + • • • + An-iX"1"1 with coefficients in the quotient field A of >4, we have (2.28) (<*0 <*! ••• Otn-l /?0 •• /?m-l )[R{f, 9)] = (c0 Q ••• cm+n_!), (2.29a)
46 II: CURVES IN PROJECTIVE SPACE where a(X)f(X) + b(X)g(X) = c(X) = c0 + cxX + • • • + cm+n-i*m+n-1. (2.29b) If nontrivial a(X) and b(X) exist with coefficients in A such that c(X) = 0, then ker[R(fyg)]tr ^ 0 and R(f,g) = 0. Conversely if fl(/,flr) = 0, then ker[fl(/,^)]tr ^ 0 and there exist nontrivial a(X) and b(X) with coefficients in >4 such that c(X) = 0. Clearing denominators, we obtain a(X) and b(X) with coefficients in A such that c(X) = 0. Last statement: If R(fyg) jL 0, then the cofactor formula for the inverse of [R(f} g)] with entries in A says that IW.g)]-1 = R(f,g)-l[S{f,g)], where [5(/,^)] has entries in A. Thus we can define elements a0,... ,an-i,A)>-- ■,]8m-i in i4 by (a0 «1 ••• «ri-l 00 *•• &n-l) = (*(/,*) 0 ... 0)[/?(/, (z)]-1. If we define a(X) and 6(X) in yl[Ar] by (2.28), then (2.29) shows that a(X)f(X) + b(X)g(X) = Rif9g). Proposition 2.17. If A = k[xi>... ,xr] and if / and g are as in (2.27) with cij homogeneous of degree m — j and 6; homogeneous of degree n — j} then R(f}g) is homogeneous of degree mn. PROOF. For an exponent u obtained by factoring powers (tVn-\...Jr,r-1l...) tnam •• \ of < from the rows of the matrix displayed below and for v obtained by factoring powers (£m+n,*m+n~1,...) from the columns, we have /tntma0 tntm'lai • | 0 tn-ltma0 • tuR(f,g)(txu..-,txr) = det I tmtnb0 tmtn'lbi ••• tmbn \ 0 tm"ltnb0 ••• / = tVR(f,9)(*l,~.,Xr). So R(f,g)(tx) = tv~uR(fyg). Computing u and v, we find that u = ^m(ra-f l)+^n(n + l) and i; = ^(m-f n)(m + n + l), so that v — u = mn. By Proposition 2.2 applied to fc, R{f,g) is homogeneous of degree mn.
5. BEZOUTS THEOREM AND RESULTANTS 47 Let us restate Bezout's Theorem, originally given as Theorem 2.4. Theorem 2.18 (Bezout's Theorem). Suppose F 6 k[xy yyw]m and G € k[xyyy w]n are plane curves. Then F(k) D G(k) is nonempty. If it has more than mn points, then F and G have as a common factor some homogeneous polynomial of degree > 0. Proof of first conclusion. Write F and G in the form F(xyyyw) = a0 + axu; + • • • + amwm with a; 6 k[xyy]m-j ( . G(x, y, w) = 60 + biw -f h bnwn with 6;- € fc[x, y]n-j. Regarding F and G as polynomials in w, with coefficients in A = k[xyy]y we form R(FyG)y which Proposition 2.17 identifies as a member of k[x>y]mn- Since R(FyG)(x,y) is homogeneous and k is algebraically closed, we can choose a point (xo>2/o) ^ (0,0) where R{FyG)(xoyy0) = 0. Then the resultant of F(x0y y0y w) and G(xoyyoyw) is 0, and Proposition 2.16 says that these two polynomials in w have a common factor. Since k is algebraically closed, this common factor vanishes at some w = w0, and then we must have F(x0yyoywo) = G(xo>2/o, u>o) — 0. Proof of second conclusion. Suppose F(k) n G(k) contains mn -f 1 points. Join these points by lines, and pick a point defined over k that is not on any of the lines. Applying a projective transformation, we may assume the point is (0,0,1). Write F and G in the form (2.30). Regarding F and G as polynomials in wy with coefficients in A = £[2,2/], we again form R(FyG)} which Proposition 2.17 identifies as a member of k[xyy]mn. For fixed (xo,2/o), Proposition 2.16 says that R(FyG)(xoyyo) — 0 if and only if F(xoyy0yw) and G{x0yy0yw) have a common factor (hence necessarily some w — wq factor since & is algebraically closed), if and only if F(xQyy0yw0) = G(x0yy0ywQ) = 0 for some wo- So at each of our mn + 1 points, say (x,-,yi}Wi)y we have R(FyG)(cxiycyi) = 0 for all c. Since (xiyyi) ^ (0,0), R{FyG) vanishes on the line y,x — x,y = 0. Consequently t/,x — x,y divides R(Fy G). Suppose (xiyyi) = (xjyyj). Then (xiyyiyWi) and (xjyyjyWj) both satisfy y,x — xty = 0. Since (0,0,1) satisfies this also and since (0,0,1) is not to be on any of the connecting lines, we obtain a contradiction. Thus the mn-J-1 factors yix — Xiy are distinct primes in k[xy y] dividing R(FyG). By unique factorization, their product divides R(FyG). Since degR(*FyG) = mny we conclude R(FyG) = 0. Then Proposition 2.16 shows that F and G have a common factor in £[x,t/][w] = k[xyyyw). The common factor is homogeneous by the first conclusion of Corollary 2.5 (which does not depend on Bezout's Theorem).
48 II: CURVES IN PROJECTIVE SPACE Corollary 2.19. Suppose F € k[x,y,w\d is a plane curve and L is a line. If F(k) C\ L(k) has more than d elements, then L divides F. This condition is satisfied if F vanishes on L(k) and Jfe has > d elements. Proof. The number of points on L(k) is one more than the number of elements of fc, hence is > d+1. Then F(k)C\L(k) has > d+1 points, and Bezout's Theorem says L and F have a common factor. Since L is prime and fc[x,y,u;] has unique factorization, L divides F. The full-strength version of Bezout's Theorem says that two plane curves F and G of degrees m and n meet in < mn points even when multiplicities are counted, and that the number is = mn if k is algebraically closed and multiplicities are counted. We do not need the full-strength version of Bezout's Theorem and have consequently not defined intersection multiplicity in full generality. The result that we shall prove instead is the corresponding sharpening of the special case of Corollary 2.19 in which one of the curves is a line. Corollary 2.20. If F G k[xyy, w]d is a plane curve and L is a line such that J2P i(P>L>F) > d, then L divides F. Proof. Without loss of generality, we may assume k is algebraically closed and is in particular infinite. Suppose L does not divide F. Then Corollary 2.19 shows that F(k)DL(k) is finite, and it follows from Proposition 2.8 that £V *(P> £> F) ls finite. Possibly by applying a projective transformation, we may assume that the line L is w = 0. Then the points Pj with i(Pj,L, F) > 0 are of the form [(xjy VjyO)]. Possibly by applying a second projective transformation, one that translates the y variable, we may assume that no yj is 0. Then we can write Pj = [(rj,l,0)] with rj in k. Now ^(£,1,0) is a polynomial in x of degree < d. We shall prove that i((ry 1,0), w, F) = multiplicity of r as a root of F(xy 1,0). (2.31) Since k is infinite, there is some r not in F(k) D L(k), and this r in (2.31) shows that F(x,l,0) is not the 0 polynomial. Hence F(x,l,0) has < d roots, counting multiplicities, and it follows from (2.31) that Dp ,#(P» L, F) < d, as required. To prove (2.31), we introduce affine local coordinates about (r, 1,0), /l 0 r\ using fc-1 = I 0 0 1 J , so that $(r, 1,0) = (0,0,1). The local ver- \0 1 0/ sions / of F and / of L = w are f(x,y) = F(<t>-\x,y,l))= F(x + r,l,y) l(x,y) = L(*-l(x,y,l)) = y.
5. BEZOUT'S THEOREM AND RESULTANTS 49 Thus /(x,y) is of the form bx — ay with a = — 1 and 6 = 0. We can parametrize / by (p(t) = (.) = ( J, and then /(p(0) = /H,0)=F(-« + r,l,0). The order of vanishing of f(ip(t)) at t = 0, which is i((r, 1,0),L, F), thus equals the order of the 0 of F(—t + r, 1,0) at t = 0, which equals the multiplicity of r as a root of F(x, 1,0). This proves (2.31), and the corollary follows.
CHAPTER III CUBIC CURVES IN WEIERSTRASS FORM 1. Examples We begin with some examples of plane curves, showing how to bring them into Weierstrass form and discussing some of their solutions. A. Diophantus cubic example. The equation from Chapter 1 in affine form is y(6-y) = x3-x. (3.1) Following the procedure in the proof of Proposition 2.13, we replace y by — y and x by — x to obtain y2 + 6y = x3 - x. If we complete the square and replace y + 3 by y, we are led to y2 = x3 _ x + g (3 2) The trivial solutions of (3.1) were x = 0 or ±1 with y — 0 or 6. These correspond in (3.2) to x = 0 or ±1 with y = ±3. Diophantus's nontrivial solution of (3.1) began with P = (—1,0) and amounted to using PP = (■y, ||); here PP is the result of applying the chord-tangent composition law (§11.4) to P and P. In terms of (3.2) we would take the corresponding trivial solution P = (1,3) and use PP = (-^, ff). B. Cubic case of Fermat's Last Theorem. The equation in projective form is x3 + y3 = t/;3. (3.3) Since we can always clear denominators, integer solutions and rational solutions amount to the same thing. With F(xyyyw) = x3 + y3 — w3} we have dF 2 dF 2 dF 2 — = 3x , — = 3y , — = -3ur. ox oy ow 50
1. EXAMPLES 51 The only place where all first partials are 0 is (0,0,0), which does not correspond to a point in projective space. By Proposition 2.6, F is nonsingular. To look for flexes, we compute the Hessian determinant /6x 0 0 \ deti/(x,y,w) = det I 0 6y 0 = -63xyw. \ 0 0 -6w) We seek simultaneous solutions of (3.3) and xyw = 0. (3.4) For example, we can use (x0,yo> ™o) = (0,1,1). To map this to (0,1,0), which is the point at infinity of a curve in Weierstrass form, we ap- /l 0 0\ ply successively 10 1 -1 J to carry (0,1,1) to (0,0,1) and then \o o i; 1 0 0\ 0 0 1 J to carry (0,0,1) to (0,1,0). The tangent y - w maps first ,0 1 0/ to y and then to w. Hence maps F into Weierstrass form except for scaling. To carry out the computations, we put (a b c) = <^(x y w). Then 6 = 6 + c Substituting in (3.3), we obtain a3 + (6 + c)3 = 63. Passing to affine form by replacing (a}b}c) by (xyy} 1), we obtain 3y2 + 3y =-x3 - l.
52 III: CUBIC CURVES IN WEIERSTRASS FORM We scale by changing y to —3y and x to —3x, and the result is y* - ly - ~3 _ J_ y ^y —x 27* We make a further adjustment to clear denominators: We change y to y/u3 and x to x/ti2 with u = 3. Then the equation becomes y2 - 9y = x3 - 27. (3.5) Fermat showed that the only Q-rational (projective) solutions of (3.3) are the trivial ones: (xyy%w) must be one of (1,0,1), (0,1,1), and (1,-1,0). The corresponding solutions of (3.5) are (3,9), oo, and (3,0). For future reference let us record the composite transformations from (3.3) to (3.5) and back. If (X,Y) satisfies X3+Y3 = 1 and (x,t/) satisfies y2 — 9t/ = x3 — 27, the relationships are 3X y v- 9 Y = i—- x = and { 1 gY (3.6) y= T^V' These are the transformations that we used in discussing u3 + v3 = w3 in Chapter I. Equation (3.5) can be transformed some more. Completing the square eliminates the y term and leads to y2 = *3-f. (3.7) Scaling by (x,t/) —► (x/u2,y/u3) with u = 2 leads to y2 = x3-24-33. (3.8) C. Congruent numbers. A positive integer n is congruent if n is the area of a rational right triangle. For example, the Pythagorean triple (3,4,5) has area 6. So n = 6 is congruent. Congruent numbers were studied as early as 984 A.D., were taken up by Fibonacci in 1225, and were later studied by Fermat. The problem is: Given n, decide whether n is congruent. Results for small square-free n are as follows: 1 not congruent Fermat 2 not congruent Fermat 3 not congruent Fermat 5 congruent for (|, ^, ~) Fibonacci 6 congruent for (3,4,5). Part of the connection between the congruent-number problem and elliptic curves is given in the following proposition. We shall study the relationship further in Chapter IV, and we shall prove there that n = 1,2,3 are not congruent.
1. EXAMPLES 53 Proposition 3.1. If n is a square-free positive integer, then the following are equivalent: (1) n is congruent: n = ^ab, where (a,6,c) is a rational Pythagorean triple. (2) There exist three rational squares in arithmetic progression with common difference n. Moreover, these conditions imply (3) There exists a Q rational point (x,y) on y2 = x3 - n2x (3.9) other than (—n,0), (0,0), (n,0), and oo. Examples. Arithmetic progressions meeting the second condition for the congruent numbers 5 and 6 are — (if.?) - ((SHS)'-®*)- PROOF. (1) => (2). Given (a,6,c), put x = c2/4. Then (a - 6)2/4 = x — n and (a + 6)2/4 = x + n, so that x — n and x and x + n are all squares of rational numbers. (2) => (1). Given x such that x — n and x and x + n are all squares, put a = (x + n)i/2 + (x_n)i/2 6 = (* + n)1'2-(*-n)1/2 c = 2x1'2. Then a,6,c are rational, and a2 + b2 — c2. (2) => (3). If x is the middle number in the progression, then the product of the three is x3 — n2x and is a square. Hence (3.9) holds with x as the middle number of the progression. The progression cannot be —n, 0, n since n is square-free. Thus the Q rational point in question satisfies (3). Remark. In Corollary 4.3 we shall prove conversely that (3) => (2).
54 III: CUBIC CURVES IN WEIERSTRASS FORM D. Quartic case of Format's Last Theorem. Fermat showed that x4 -f y4 = z4 has no nontrivial integer solutions by showing that u4 + v4 = w2 (3.10) has no nontrivial integer solutions. We give his proof as Proposition 4.1. Equation (3.10) is not homogeneous, but we can still study it as f —) + 1 = f — J . Thus the equation to study for Q-rational point! s is y2 = z4+l. (3.11) If we make the nonprojective change of variables x —► x and y —► y + x2, we are led to the cubic equation y2 + 2x2y=\ affine 2 2 3 \O.LZ) y w + 2x y = w projective. We can verify that (3.12) is nonsingular and can look for Q-rational flexes in the same way as for (3.3). The only Q-rational flex is at (xyyyw) = /0 1 0\ (1,0,0), where the tangent line is y — 0. We apply ^i = II 0 0 I in order to send (1,0,0) to (0,1,0) with the tangent becoming x = 0, and /0 0 1\ then we apply <^2 = I 0 1 0 in order to fix (0,1,0) and to make V1 ° °/ the tangent become w = 0. The resulting equation is 2y2 = x3 - x. Scaling by sending y to 2y and x to 2x changes the equation to 2 3 1 V = x° - \x. We can make the coefficients be integers by sending y to i^y and x to \x\ the result is the elliptic curve y2 = x3-4x. (3.13) Since (3.10) has only trivial solutions, the only Q-rational solutions of (3.13) are oo, (0,0), and (±2,0). We shall prove this result directly in §IV.7.
1. EXAMPLES 55 For future reference let us record the composite transformations from (3.11) to (3.13) and back. If (X, Y) satisfies Y2 = X4 + 1 and (x,t/) satisfies y2 = x3 — 4x, the relationships are x = X 2x Y y2 + 8* 4*2 . > and < X = y = *"-** (3 14) 4X v ' Y -X2' The quattic Fermat equation x4 + y4 = z4 can be treated also as u4 + v2 = w4. This equation becomes y2 = x4 — 1, and an analysis like the one above reduces it to the elliptic curve y2 = x3 + 4x. (3.15) The Q-rational solutions of (3.15) are oo, (0,0), and (2, ±4). E. Fermat's problem for Mersenne. In 1643 Fermat posed to Mersenne the problem of finding a relatively prime integer-valued Pythagorean triple (X} Y, Z) such that the hypotenuse Z is a square and the sum of the legs is a square. The answer that Fermat had in mind is X = 1061652293520 Y = 4565486027761 (3.16) Z = 4687298610289, and this is the smallest nontrivial solution. In algebraic terms, we are given X2 + Y2 = Z2, Z=62,X + 7 = a2. If we write e = X - Y, then X and Y are \(a2 ± e). Thus 64 = Z2 = X2 + Y2 = i(a4 + e2) and e2 = 264-a4. (3.17) Dividing by a4, we are led to the affine curve v2 = 2u4-l. (3.18) If (u, v) satisfies (3.18) and if (x, y) is defined by 2(v + 2u2 - 1) x = («-i)2 4[(2tt - l)v + 2u3 - 1] («-D3 (3.19)
56 III: CUBIC CURVES IN WEIERSTRASS FORM then (x, y) satisfies y2 = x3 + 8x. (3.20) Conversely if (x,y) satisfies (3.20), then (u,v) may be recovered from _ y - 2x - 8 "~ y - Ax + 8 (3 21) _ y2 - 24x2 + 48y - 16s - 64 V ' ; V " (y-4x + 8)2 so as to satisfy (3.18). For example, (x,y) = (0,0), (1,3), and (1,-3) correspond to (u,v) = (-1,-1), (-1,1), and (-13,-239). But there are some singularities: (u, v) = (13,239) maps to (x,y) = (8,24), but (x,y) = (8,24) maps to a removable singularity. Also (x,y) = (8,-24) maps to (ti,v) = (1,-1), but (u,v) = (1,-1) maps to a removable singularity. Large points (x,t/) correspond to points near (u>v) = (1,1), and large points (u, v) correspond to points near x = 4 ± \/8. We shall not be concerned at this time with how to obtain the transformations (3.19) and (3.21). What we shall see, apart from the singularities, is that a Q rational point on the elliptic curve (3.20) leads to a Q-rational solution (u, v) of (3.18), hence to an integral solution (a,6,e) of (3.17) with a and 6 relatively prime. Then a and 6 must both be odd, and so must e. We can thus define an integer tuple (Xy y, Z) by X = i(a2 + e), V = i(a2-e), Z = b\ (3.22) and the only question will be whether X and Y are positive. Unfortunately small solutions like (a,6,e) = (1,13,239) lead to Y negative. We shall return to Fermat's solution to the problem—and to the interpretation in terms of elliptic curves—in Chapter IV. 2. Weierstrass Form, Discriminant, j-invariant A cubic over Jfe in Weierstrass form is given projectively by y2w + a\xyw -f a3yw2 = x3 + d2X2w + a+xw2 -f fl6^3» (3.23a) with coefficients in k. We know that the only k rational point on the line at infinity w = 0 is (x,2/, w) = (0,1,0). Moreover, we saw in §11.4 that (0,1,0) is a nonsingular point and is an inflection point, the tangent line being the line at infinity. Since the behavior at (0,1,0) is so well
2. WEIERSTRASS FORM, DISCRIMINANT, j-INVARIANT 57 understood, we can study much of the behavior of the curve by working with the affine form y2 + axxy + a3y = x3 + a2x2 + a4x + a6, (3.23b) This form has the advantage that the notation is simpler. The significance of the subscripts will be explained later in this section. In the examples in §1, specific plane curves of the form (3.23b) simplified under changes of variables. Actually these changes of variables are valid more generally, under only an assumption on the characteristic of k. The expressions for how the coefficients of (3.23b) are affected by these changes of variables do not depend on the characteristic and are as follows. The notation is absolutely standard: and 62 = a\ + 4a2 64 = 2a4 + aids h = <*3 + 4fl6 68 = a\aQ + 4a2a6 - aia3a4 + a2a\ c4 = b\ - 2464 c6 = -6^-f366264 -21666. (3.24) (3.25) The first simplification of (3.23b) assumes that char(fc) ^ 2. We complete the square in (3.23b), replacing y-f \{p.xx-\-a^) by |y, and the result is y2 = 4x3 + 62x2 + 264x + 66 (3.26) with 62,64,65 as in (3.24). (The coefficient 6g will play a role later in this section and also in §4.) The second simplification assumes in addition that char(Ar) ^ 3. We replace (x, y) in (3.26) by ( ———, —— ), and the result is \ ob 108/ y2 = x3 - 27c4x - 54c6 (3.27) with c4 and c6 as in (3.25). Although our chief interest is in elliptic curves over Q, it is not enough to consider only equations (3.27). The reason lies in what happens when we reduce curves over Q modulo a prime p. The following example illustrates.
58 III: CUBIC CURVES IN WEIERSTRASS FORM EXAMPLE. Cubic case of Fermat's Last Theorem. The Fermat equation (3.3) led to the Weierstrass form in (3.5): y2 - 9y = x3 - 27. (3.28) Further changes of variables then led to the equation y2 = x3-2433, (3.29) which is of the form (3.27). Following Theorem 3.2, we shall see that (3.28) is nonsingular modulo all primes but 3, while (3.29) is nonsingular modulo all primes but 2 and 3. Thus the passage from (3.28) to (3.29) loses information that might be obtained by reduction modulo 2. For any field ky we introduce the discriminant A of the curve (3.23b) or (3.26) by the formula A = -b\b% - 863 - 276j? + 9626466 (3.30) with 62,64,66,6s as in (3.24). When char(fc) is not 2 or 3, we can solve for A from the relation 1728A = c34-c26. (3.31) WARNING. If (3.27) is used as our original equation, its A is 2639(c4* - eg), and this is off by a factor of 212312 from A in (3.31). The factor is introduced by the scaling in passing from (3.23) to (3.26) and then (3.27). Theorem 3.2. The cubic (3.23) is singular if and only if A = 0. This theorem will be proved later in this section. By way of examples, note that A = -39 for (3.28), and A = -21239 for (3.29), so that (3.29) is singular modulo 2 but (3.28) is not. To get at the proof of the theorem, we introduce the discriminant of a cubic polynomial in one variable. Let f(x) = x3 - ax2 + pz - 7 = (x - n)(x - r2)(x - r3) (3.32) be a monic cubic polynomial over k with roots in k. Here a, /?, and 7 are given by the elementary symmetric polynomials ot = ri + r2 + r3, 0 =. rir2 + rxr3 + r2r3y 7 = rir2r3.
2. WEIERSTRASS FORM, DISCRIMINANT, j-INVARIANT 59 We can check that /i i i\ det r2 r2 r3 = (r3 - r2)(r3 - ri)(r2 - n) (3.33a) \r? r2 r|/ (3.33b) where 07 = rj -f r\ + 7*3 for 1 < i < 4. The discriminant d of /(x) is given by d=(r1-r2)2(r1-r3)2(r2-r3)2 (3.34) Proposition 3.3. The discriminant d of the polynomial f(x) in (3.32) is given by (3 <tx a2\ *1 a1 ^3 J , 02 G"3 ^4 / where <ri = a <r2 = a2- 2(3 a3 = a3 - 3a(3 -f 37 04 = otA - 4a2/? 4- 2/?2 + 4<*7. Proof. The determinant formula follows from (3.33) and (3.34). Then 0*1, 0*2> 0*3) 04 are symmetric polynomials in r\yr2,r3 and hence are polynomials in the elementary symmetric polynomials a,/?,7. There is an algorithm for finding the polynomials in a,/?, 7, and application of it yields the above expressions for 0*1,0*2, 0"3,04- These expressions can be verified by direct computation. Corollary 3.4. For the cubic polynomial f(x) = x3 -f px -|- </, the discriminant is d = —4p3 — 27<j2. Proof. This is the special case of Proposition 3.3 in which a = 0, P = p, and 7 = — q.
60 III: CUBIC CURVES IN WEIERSTRASS FORM Example. Cubic polynomial f(x) defined over R. The discriminant d is 0 if and only if / has a repeated root. If the three roots are real, then d > 0. If / has one real root r\ and one pair of complex conjugate roots r2 and r^ = ft, then (ri — T2)(ri — r2) is real and (r2 — f2) is imaginary; since d is the square of the product, d is < 0. In the general case, d = 0 for the cubic polynomial in (3.32) if and only if at least two of the roots are equal. For a cubic polynomial that is not monic, we define d to be the same as for the multiple that is monic. If we then replace x by ^ in a cubic, the discriminant gets multiplied by C6 (since each root gets multiplied by C). The relevance of the discriminant d to detecting singularities of cubics in Weierstrass form is as follows. Proposition 3.5. If C is a nonzero element of k and if char(fc) ^ 2, then the plane curve y2 = C(x3 - ax2 + 0x-y) (3.35) is nonsingular if and only if /(#) = C{x2 — ax2 + 0x — 7) has distinct roots in k. Proof. Since §11.4 showed that there can be no singularity on the line at infinity, Proposition 2.6 says that the curve is singular if and only if there exists a k rational point (xo,yo, 1) on t-ne curve where the following three equations are all satisfied: — : 0 = 3x20 - 2axQ + /? ox h- 2yo = 0 j^- J/o = C{-ax\ + 2/3xo - 3T) Equations (3.35), (3.36a), and (3.36b) are equivalent with 0 = j,0 = f(x0) = f'(x0), (3.37) and (3.36c) is redundant, giving the extra condition 3/(xo) —xo/'(*o) = 0. Thus the only candidates for singular points over k are (xo,0,1), where x0 is a root of /, and such a candidate (xq, 0,1) is singular if and only if Xo is a multiple root of /. (3.36a) (3.36b) (3.36c)
2. WEIERSTRASS FORM, DISCRIMINANT, j-INVARIANT 61 Proposition 3.6. If char(fc) ^ 2, let db and dc be the discriminants of the cubic polynomials on the right sides of (3.26) and (3.27), respectively. Then dc = 2l2312</6 (3.38a) and A = 2Adb. (3.38b) Proof if char(fc) ^ 3. Apart from translations, (3.27) is obtained from (3.26) by replacing x by x/C with C — 62, and we have seen that the effect on discriminants is to multiply them by C6. Thus (3.38a) follows. By Corollary 3.34 and (3.31), we have dc = -4(-27c4)3 - 27(-54c6)2 = 22 • 39 • 123A. Then (3.38b) follows from (3.38a). Proof if char(fc) = 3. It is immediate from Corollary 3.34 that dc = 0, and hence (3.38a) is valid. The discriminant db of (3.26) is the same as that of (3.35) with « = —, /?=y- 7 = -T- Since 3 = 0, Proposition 3.3 says that df, = 2c7"i(T2<T3 — <72 ~ 0"i (T4 = — <Ti(T2<T3 — (72 — G^G^, where <T\ — a (T2 = a2 + /? <73 = a3 (T4 = a4 - o?P - p2 + ay with a = -62, 0 = ~*4i 7 = ~be- Substituting gives db = _/?3 + ^2 _ a37 = 63 + 6262 _ ^ (3 39) Meanwhile, in any characteristic we can check that 468 = 6266-&2, (3.40) an equality that was already used to show that (3.30) implies (3.31). Therefore in characteristic 3, A = -b$bg + 63 = -bibs + b\b\ + b\. Comparing this equation with (3.39), we see that A = db, and (3.38b) follows.
62 III: CUBIC CURVES IN WEIERSTRASS FORM Proof of Theorem 3.2. Case 1. Suppose char(fc) ^ 2. Then (3.23) is singular if and only if (3.26) is singular, if and only if the right side of (3.26) has a repeated root (by Proposition 3.5), if and only if db = 0, if and only if A = 0 (by (3.39)). Case 2. Suppose char(fc) = 2. Then A reduces to A = b\b& + b\ + 62Me = a\dG + a\a^aA + a\a2a\ + a\a\ + a\ + a\a\. (3.41) Meanwhile, just as in the first paragraph of the proof of Proposition 3.5, (3.23) can have singularities only at k rational points (xo,#o> 1) on the curve, and it has a singularity at such a point if and only if -*- : O = aiy0 + *o + a4 (3.42a) ox — : 0 = aizo + a3 (3.42b) oy ■7T- : 0 = yl + aiXoyo + a2xl + a6- (3.42c) Equation (3.42c) is redundant, being the sum of the curve, xo times (3.42a), and y0 times (3.42b). Suppose ai = 0. Then A = 0 if and only if a$ — 0, if and only if (3.42b) holds. To complete this case, it is enough to show that the system Vo = xo + a2^o + a*xo + <*6 0 = xl + a4 has a solution in fc. But we have only to choose xq 6 Jb so that the second equation holds, substitute into the first equation, and choose yo £ £ so that the first equation holds. Suppose ax ^ 0. Then (3.42b) and (3.42a) successively give xq = a]~ «3 and yo = aj~ a3 -f- aj~ a4. Substitution of these values for x and t/ in the difference of the two sides of (3.23b) gives (a^6a$ + a*2^) + (a^a^ + 0^0304) + (ar3fl3 + a^c^) + aj"3°3 + o>\2a20>\ + ar Ia3«4 + a6, and (3.41) says this is just a]~6A. Thus (xo,t/o) satisfies (3.23b), yielding (#o> 2/0) 1) as a singular point, if and only if A = 0. This completes the proof of the theorem.
2. WEIERSTRASS FORM, DISCRIMINANT, j-INVARIANT 63 An admissible change of variables in a Weierstrass equation (3.23b) is one of the form x = t/V -f r and y - u3y' + su2x' + t (3.43a) with it, r, 5, t in k and u ^ 0. In matrix form it is given by x\ (x'\ ( u2 0 r\ y j = fc-1 I j,' 1 with fc"1 = fit/2 u3 / (3.43b) It fixes [(0,1,0)] and carries the tangent w = 0 to the same line. The cubic F in Weierstrass form gets carried to the same curve as a cubic F* still in Weierstrass form. Up to a constant, admissible changes of variables are the most general linear transformations with these properties. We have already used such changes of variables on several occasions to normalize Weierstrass equations in various ways. Two elliptic curves that are related by an admissible change of variables are said to be isomorphic. Under the change of variables (3.43) and the normalization that makes the coefficients of wy2 and x3 be 1, let F get carried to F*. In the special case that r = 5 = t = 0, this passage F —> F* multiplies the coefficients ax ,a3, a2,... ,62, • • • >C4,C6 by powers of u. The significance of a subscript i is that that coefficient has been multiplied by u~%. We say that the coefficient has weight i. More generally when r,syt are not all 0, the above passage F —► F* maps the coefficients a 1,03,02, • • )^2) • • • 1^4, C6 into new coefficients a'],0-3,112, •.. ,&2, • • • ,c4,Cg. Then a primed coefficient is u~~l times an expression in r, s,<,Oj,... ,c6 that does not involve u. In the case of c4 and C6, we have c4 = u~4c4 and c'6 = u"6C6 with r, s,£ not involved. When coefficients are multiplied, their weights add. Linear combinations of terms with the same weight again have that weight. In this sense the discriminant A has weight 12. Since A can be expressed in terms of c4 and ce (at least when k has characteristic not equal to 2 or 3), A is unaffected by r, sy and t.
64 III: CUBIC CURVES EM WEIERSTRASS FORM Elliptic Curve y2 + y = x3 - x2 y2 + xy = x3 — 2x2 + x y2 + y = x3 + x2 + x y2 + xy+y = x3 y2 +y = x3 y2 + xy-y= x3 y2 + y = x3 + x2 — x y2 + xy = x3 + x2 + x y2 + y - x3 + x2 y2 = x3 — x2 + x y2 = x3 + x2 + x y2 + xy + y = x3 - x2 y2 + Zxy -y=x3 y2 + xy = x3 — x2 + x y2 + xy = x3 - 1x + 1 y2 + xy = x3 + x y2 — x3 + x y2 + y = x3 — 5a;2 - 4a; - 1 y2 + Zxy + y = x3-x2 y2 + xy - y = x3 + x2 y2 + y = x3 + x y2 + 4xy - y = x3 A -11 -15 -19 -26 -27 -28 -35 -39 -43 -48 -48 -53 -54 -55 -61 -63 -64 -67 -83 -89 -91 -91 c4 16 1 -32 -23 0 25 64 -23 16 -32 -32 -15 153 -39 97 -47 -48 592 -47 49 -48 352 eel -152 -161 8 -181 -216 -253 -568 235 -280 -224 224 -297 -1917 -189 -1009 71 0 14408 199 -521 -216 -6616 | Table 3.1. Some elliptic curves with small negative discriminant To get some feeling for elliptic curves (i.e., nonsingular cubics (3.23)) without relying on classical examples or random selection, one can look for elliptic curves of small discriminant. These, it turns out, are an infrequent occurrence. They remain nonsingular, when reduced modulo p, for all but a small number of primes p, and thus they supply a good source of examples for many purposes. Tables 3.1 and 3.2 list, up to admissible changes of variables over Q, all elliptic curves with |A| < 100 whose coefficients are all integers < 9 in absolute value. The triple (A,C4,C6) can be used to detect equivalence under admissible changes of variables over Q: It is always true that two elliptic curves over Q with the same (A,c4,c6) are related by a transformation (3.43), since the changes of variables that pass from (3.23) to (3.27) can both be inverted. Conversely if two elliptic curves are related by a transformation
2. WEIERSTRASS FORM, DISCRIMINANT, j-INVARIANT 65 (3.43), their discriminants stand in the ratio of u12, and this is possible with integer coefficients and with |A| < 100 only when t* = ±1. Hence (A,C4,C6) is a complete invariant of the equivalence class for curves limited as in the tables. Elliptic Curve y2 + Ixy + 2y = x3 + 4a;2 + x y2 + 3xy = x3 + x y2 + y - x3 - x y2 + 2xy - 3y = x3 - 1 j/2 + xy = x3 - 3x2 + x y2 + 9xy - 9y = x3 + 9x2 - 5x - 3 y2 = x3 — x y2 + xy = x3 — x y2 + xy = x3 — x2 — x y2 + ixy — y = x3 — x2 y2 = x3 - 2x + 1 y2 = x3 - 2x - 1 y2 = x3 + x2 — x y2 = x3 — x2 — x j/2 + xy = x3 + x2 — x y2 + 5xy + y = x3 A 15 17 37 37 57 62 64 65 73 79 80 80 80 80 89 98 c4 3841 33 48 160 73 15873 48 49 57 97 96 96 64 64 73 505 °6 -238049 -81 -216 -2008 539 -1999809 0 -73 243 -881 -864 864 -352 352 -485 -11341 Table 3.2. Some elliptic curves with small positive discriminant For an elliptic curve the j-invariant is defined by j = c3/A. (3.44) This is well defined by Theorem 3.2, and it has weight 0. It is unaffected by r, 5, and t and thus is invariant under all admissible changes of variables. Examples. Consider y2 = x3 +px -f q over a field with characteristic not equal to 2 or 3. We have C4 = —48p and ce = —864<j. (See the Warning before Theorem 3.2.) By (3.31),
66 III: CUBIC CURVES IN WEIERSTRASS FORM Thus (3.44) gives '' = 1728vw (345) Thus, for example, every elliptic curve y2 = x3 + ax has j = 1728; this includes those in Subsections C, D, and E of §1. Similarly every elliptic curve y2 = x3 + a has j = 0; this includes the one in Subsection B of §1. Finally the Diophantus example in Subsection A of §1 was y2 = x3—x+9. 2s • 33 This has p = — 1 and q — 9. Thus (3.45) gives j = — . zloo Proposition 3.7. Suppose that k has characteristic not equal to 2 or 3. (a) If two elliptic curves are related by an admissible change of variables, then they have the same j-invariant. (b) If jo € A: is given, then there exists an elliptic curve over k with j-invariant j0. (c) If k is algebraically closed and two elliptic curves have the same j-invariant, then they are related by an admissible change of variables. PROOF, (a) This was observed above. (b) The cases jo = 0 and j0 = 1728 were handled in the examples above. For other values we can specialize the above example to „2 _ 3 27 Jo „ 27 Jo y x —t~~—r^oo"x ~ 4 jo-1728 4 jo-1728' 27 in in which p = q = — — - r=^r, and then we can apply (3.45). 4 jo — 1728 (c) We can normalize the two curves to be y2 = x3 + Ax + B and y2 = x3 -f A'x + 5', by (a). Here A has weight 4 and B has weight 6. Thus if we put y = i//u3 and x = x*/u2 in the first equation, we get y2 = x3 + Au4x + Bue. What we want to arrange is that A' = Au4 and B'= Bu6. (3.46) Equality of j-invariants means that AA3 AA'3 AA3 -I- 27B2 AA'3 + 21B'2 '
3. GROUP LAW 67 from which we see that A3B'2 = A'3B2. (3.47) Now we distinguish cases. (i) A = 0. Then B ^ 0 by nonsingularity, and A' = 0 by (3.47). If we take u = (B'/S)1/6, then (3.46) holds. (ii) B = 0. Then A / 0 by nonsingularity, and B' = 0 by (3.47). If we take u = (yl'/^)1/4, then (3.46) holds. (iii) AB ^ 0. First we take u = (yl'/yl)1/4 to get 4' = ,4t/4. Then (3.47) says B'2 = (A'/A)3B2 = u12£2. Hence fl' = 5u6 or B' = -Bu6. In the first case, (3.46) has been verified. In the second case we replace u by y/^lu and recompute to find that (3.46) holds. 3. Group Law For a nonsingular cubic curve with a specified k rational point 0, we noted in Chapter I that the operation P + Q = O • PQ> given in terms of the chord-tangent composition law of §11.4, makes the points on the curve into an abelian group with O as identity. In particular, the conclusion applies to elliptic curves, which are nonsingular cubics in Weierstrass form. In this section, we shall state this result as a theorem and give a geometric proof. Theorem 3.8 (Poincare). Let F be a nonsingular cubic, and let O be in F(k). Then the operation P + Q = O(PQ), given in terms of the chord-tangent composition law, makes F(k) into an abelian group with O as identity and with negatives given by — P = (OO)P. If A' is a field extension of kf then the inclusion F(k) C F(K) is a group homomorphism. If a different base point O' is chosen, then the two operations are related by P+'Q — P + Q — O', and the group structures are isomorphic. The heart of the matter is to prove associativity, whose essence is captured by the following lemma. We shall prove the lemma after showing how the lemma implies the theorem. Lemma 3.9. For any points P.Q.P'.Q' in F(k), (PP')(QQf) = (PQ){P'Q'Y (3.48)
68 III: CUBIC CURVES IN WEIERSTRASS FORM Proof that Lemma 3.9 implies Theorem 3.8. We saw in Chapter I that + is commutative, that O is an identity, and that —P is an additive inverse. For associativity the lemma gives the second equality of P(0(QR)) = (PQ • Q)(0 • QR) = (PQ • 0)(Q • QR) = (0(PQ))R. Thus P{Q + R) = {P + Q)R. If we apply O to both sides, we get P + (Q + R) = {P + Q) + R, as required. Thus F(k) is an abelian group under -K Clearly the inclusion F(k) C F(K) is a group homomorphism. Let O' be given. Then the lemma gives the second equality of (P + Q) - O' = 0[(0 • PQ)(00 • O')) = 0[(0 • 00)(PQ • 0')] = 0[0(P+'Q)] = P+'Q. (3.49) Define amap^: (F(Jb), +) -♦ (F(*), +') by p(P) = P - O'. Then y?(P + Q) = (P + Q)-0, = P-f,Q by (3.49). Thus y> is a homomorphism. It is clearly one-one and onto. Overview of proof of Lemma 3.9. To prove (3.48) and thus Lemma 3.9, we shall divide matters into a nondegenerate case and a degenerate case, and we shall use quite different methods for the two cases. The distinction between "nondegenerate" and "degenerate" will depend on which, if any, of the 8 points P, P', Q, Q', PQ, P'Q', PP\ QQ' are equal to one another. Specifically we arrange these points as 8 of the entries of a 3-by-3 matrix P P' PP' \ Q Q' QQ' (3.50) PQ P'Q' ) We say that we are in the nondegenerate case if none of these points is equal to a point in a different row and column (e.g.r P should not equal Q;, QQ\ or P'Q'). Otherwise we are in the degenerate case. The proof in the nondegenerate case simplifies a little if the 8 points are actually distinct, but it does not simplify much.
3. GROUP LAW 69 Proof of (3.48) in the nondegenerate case. Let L\,L'2,L'Z be the lines that meet F(k) at the points of the respective rows of (3.50), with the usual convention that if a point appears more than once in a row, then the line is supposed to be tangent to the curve at the point. Let L",!^',//?} be the similar lines determined by the columns of (3.50). Figure 3.1 illustrates matters, showing L\> Lf2,L'3 as solid lines and L"yL'2,Lf3 as dashed lines. Figure 3.1. Configuration for {PP')(QQ') = {PQ){P'Q') If we are thinking of L'lyL'2i I3, it is natural to include {PQ)(P'Q') as the lower right entry of (3.50), while if we are thinking of L", L'2', ^3> ^ is natural to include (PP')(QQf) as the lower right entry. Our objective, of course, is to prove that these two candidates for the lower right entry are equal. Thus, in referring to (3.50), we shall regard the lower right entry as blank.
70 III: CUBIC CURVES IN WEIERSTRASS FORM We claim as a consequence of our nondegeneracy assumption that each L\ is different from each L'J. (3.51) Thus L-(fc) and L"(k) meet only in the (ij)th entry of (3.50) unless i = j = 3. To see (3.51), first suppose i and j are not both 3, say i ^ 3. If L'{ = L", then the other point(s) of the ;th column must occur as other point(s) of the ith row, since the ith row gives all points of F(Jb)nL{ with their multiplicities, and this occurrence of a point in two such entries of the matrix would contradict nondegeneracy. If« = j = 3 and L'3 = L3, then PQ in L'z must occur as one of the three points of F(k) fl L'3\k)y counting multiplicities. It might be the missing lower right entry of (3.50), but then P'Q' would have to coincide with PP' or QQ'y and we would again have a contradiction to nondegeneracy. Let T be the point where L'z and L% meet; this is unique by (3.51). We shall prove (3.48) by showing that both sides of (3.48) are equal to T. In the proof we may assume that k is algebraically closed. (Actually we shall use only that k is infinite.) Let V and L" be the cubics given by L' = L[L'2L'3 and L" = L'/L'^'. (3.52) The main step of the proof will be to show that aF + a'L' + a"L" = 0 (3.53) for suitable a}a'} a" in k with a / 0. To define these constants, let R! be a point of L[(k) other than P> P;, and PP' (possible since k is infinite), and let R" be a point not in L'(k) (existence by Lemma 2.1). Choose a,a',a" in k not all 0 such that (F(R') L'(R') L"(R')\(a\_(0\ \F(R") L'(R") L"(R"))y°„J-{0)' {6™> and put F0 = aF + a'L' + a" L". (3.55) We shall prove that F0 = 0. In doing so, we shall make repeated use of two facts about intersection multiplicities of lines and plane curves. If L is a line, p is a point, and Gi and G2 are plane curves, then i(p, L, dd) = i(P, L,Gi) + i(P, L, G2). (3.56)
3. GROUP LAW 71 If, in addition, G\ and G2 have the same degree and if G\ + Gi is not 0, then i(p,L,Gi-fG2)>min{i(p,L,Gi),2(p,L,G2)}. (3.57) These are obvious from the definitions. Thus suppose Fo / 0, so that Fo is a cubic. Let L be one of the lines 1*1,1*2, Lf3,Lf{, L'2, £3 and suppose the point p is repeated N times in the corresponding row or column of (3.50), with the lower right entry not included. We shall show that i(p,LiF0)>N. (3.58) In fact, without loss of generality, suppose L is a factor of the cubic V. We have i(p,L,F)> N (3.59) by construction and by definition of the chord-tangent composition law. In addition, Proposition 2.8 gives t(p, !,//) =+00. (3.60) The point p appears in N columns of (3.50) and thus lies on > N of the lines L'{,L'2\L%. By (3.56) i{p>L,L")> N. (3.61) Putting (3.59), (3.60), and (3.61) together with the aid of (3.57), we obtain (3.58). Equation (3.54) implies that FQ(R!) = 0, hence that i(R',L[,FQ) > 1. Adding this inequality to (3.58) for L = L\ and for p equal to the distinct points in the first row of (3.50), we obtain £^p *(p, L[,Fo) > 4. From Corollary 2.20 we conclude that L\ divides Fq. Let us write Fo = L\C for a conic G. Suppose p appears N > 1 times in the second row of (3.50) but not at all in the first row. Then i{p,L'2,L\) — 0 since the first row lists all points of F(k) n L[(k). And 2(p,^2,F0) > N by (3.58). Hence (3.56) gives i(p,L'2,C)>N. (3.62) We shall show that (3.62) remains valid even if p does appear in the first row. In this case, N = 1 by the nondegeneracy assumption. Suppose that p appears > 2 times in the ;th column, possibly including once in the second row. By (3.58), we have i{p,L'-,F^) > 2. Since i{p,L'j,L\) = 1 by (3.51), (3.56) gives i(p,L'/,C) > 1. That is, p is in
72 III: CUBIC CURVES IN WEIERSTRASS FORM C(k). Hence i(p, L'2yC) > 1, and (3.62) is proved even when p appears in the first row. Summing (3.62) over the distinct points p of the second row of (3.50), we obtain £p i(pyL2yC) > 3. From Corollary 2.20 we conclude that L2 divides C. Thus we can write F0 = L[L2M for a line M. Now consider the two points of the third row of (3.50). First suppose they coincide; call their common value p. By non degeneracy, p cannot occur in the first or second row, so that i(p,L'3} L\) = 0 and i(p, £3,^2) = 0. Since (3.58) gives i(p} L'3)Fo) > 2, we conclude from (3.56) that i(p,L'3)M) > 2. By Corollary 2.20, L3 divides Af. Next suppose that the two points of the third row of (3.50) are distinct. Let p be one of them, say the one in the jth column. If p appears N times in the ,7th column, (3.58) gives i(p,L'!,F0)>N. On the other hand, i(P>L'/,L\L'2) = N~\ by (3.56). Another application of (3.56) then shows that i(p, L", M) > 1. Thus p is in M(ib). Since the same argument applies equally to the other point on the third row of (3.50), we see that L3 and M have two distinct points in common. Therefore L3 divides M. Thus L3 divides M in either case, and we conclude that Fq = cL^L2L3 = cL for a nonzero constant c. But F0{R") = 0 by (3.54), and L'(R") ^ 0 by definition of R". We have arrived at a contradiction, and we conclude that FQ = 0. This proves (3.53). We still have to show that a ^ 0 in (3.53). If a = 0, then we may assume by symmetry that a' ^ 0, hence that V divides L", Thus L[ divides L'{L2L3y and unique factorization shows that L[ is the same line as some L'', in contradiction to (3.51). We conclude that a ^ 0. Therefore F = c'L' + c"L" (3.63) for suitable c' and c" in k. Recall that T is the common intersection of I3 and L'3'. It follows from (3.63) that F(T) = 0. Since T is in F(*)nL'3(fc), T is one of PQt F'Q\ {PQ){P'Q'). (3.64a) Since T is in F(k) O L3(ib),
3. GROUP LAW 73 T is one of PP\ QQ', (PP')(QQ'). (3.64b) Since, by nondegeneracy, neither of the first two points of (3.64a) can equal either of the first two points of (3.64b), we conclude either that (PQ)(PV) = T = (PP')(QQ') (3.65) and we are done, or that one of the following four possibilities occurs: PQ = T= (PP')(QQ') (3.66) P'Q' = T= {PP')(QQ') PP' = T = (PQ)(P'Q') QQ> = T=(PQ){P'Q'). These four formulas are symmetric under permutation of notation, and it is enough to handle the first one. Thus suppose (3.66) holds. Consider (PQ)(P'Qf). This is on L'3, hence on V. Also it is on F. Since c" ^ 0 in (3.63) (F being nonsingu- lar), (PQ)(P'Qf) must be on L". Hence (PQ)(P'Qf) is on at least one ofiy, L£, and L'3'. Suppose (PQ)(P'Qf) is on L'{. Since it is on L'3) we must have (PQ)(P'Q') = PQ. Substituting into (3.66), we obtain (3.65), and we are done. Suppose (PQ)(P'Qf) is on L'{. Since it is on L3) we must have (PQ)(P'Qf) = P'Q'. (3.67) If P'Q' — PQ, we can substitute into (3.66) and obtain (3.65), and then we are done. Thus suppose P'Q' ^ PQ. From (3.67) it follows that i(P'Q'yL'3iF) = 2. (3.68) Now PQ is on L'3' by (3.66), and it is on L'( automatically. Thus i(PQ,L'3,L")>2 by (3.56). Since i(PQ} L'3)L') = +00, (3.63) and (3.57) give i(PQ,L'3>F)>2. (3.69) Combining (3.68) and (3.69), we obtain £p i(pt L'3, F) > 4. By Corollary 2.20, we find that L'3 divides F} in contradiction to the nonsingularity ofF. We conclude that (PQ)(P'Qf) is on ig. Since it is on both L'3 and L's, (3.51) shows it is T. In combination with (3.66), this conclusion yields (3.65). This completes the proof of (3.48) in the nondegenerate case.
74 III: CUBIC CURVES IN WEIERSTRASS FORM Proof of (3.48) in the degenerate case. The assumption is that two of the 8 points in (3.50) in different rows and columns are equal. Up to symmetry the cases we need to consider are at the most P = Q' P = QQ' (and hence PQ = Q1) PP' = PQ (and hence P' = Q). The identity (3.48) reduces in these cases to (PPf)(QP) = (PQ)(P'P) {PP')P = Q'(P'Q') (PP')(PiQt) = (PP')(P'Qf). The first and third of these are trivially valid, and the second is valid since both sides reduce to P'. This completes the proof of (3.48) in the degenerate case. 4. Computations with the Group Law Fix an elliptic curve E over fc, i.e., a nonsingular cubic in Weierstrass form (3.23). It is customary to choose O = (0,1,0), the point at infinity, and then Theorem 3.8 says that E(k) becomes an abelian group with O as identity element. This particular choice of O has some immediate implications: (1) OO = O since O is an inflection point. (2) The additive inverse, usually given by — P = OO • P, is now given by -P = OP as a consequence of (1). Then P = (*o>yo) implies - P = (x0,-y0 - aix0 - a3)} (3.70) because the line from O to P is x = x0 in affine form (or x — p^w = 0 in projective form). (3) If a\ = as = 0, so that E is given in affine form by y2 = f(x) with / a monic cubic polynomial, then (3.70) specializes to the statement that P=(zo,2/o) implies - P = (x0, -y0)- (3.71) In this case the elements of order 2, which have P = —P) are the points on the x-axis. There are at most three such points.
4. COMPUTATIONS WITH THE GROUP LAW 75 Let us derive a formula for the group law for E(k). Let P\ and P2 be given with P2 ^ -Pi- We shall calculate P3 = Px + P2. The notation will be Pi = (xi,yi), P2 = (^2,2/2), P3 = (z3,y3) _ , _ f line through Pi and P2 if Pi ^ P2 (chord case) "" \ tangent line at Pi if Pi = P2 (tangent case). We begin by calculating m and b. The result is {Vi ~y\ x2 - x\ 3sf + 2a2gi 4-04- Qiyi 2yi -f aixi +03 ( 3/1^2 ~y2^i ^2-^1 6 = I I -af + fl4^i + 2a6 - a3yi I 2yi + aiXi + a3 The formulas for m and 6 in the chord case are simply formulas for the line through Pi and P2 and do not involve cubics. To prove the formulas in the tangent case, we begin by computing m = ^ by implicit differentiation of (3.23b): 2y</ + axxy' + aty + azy' = 3x2 + 2a2x + a4. Putting y' = m and (x,y) = (2:1,2/1), we obtain (3.72a) in the tangent case. Along the line y = mx + 6, we have y — t/i = m(x — x\). Thus y = mx + (yi — mxi), and we obtain b = y\ — mx\. In other words . 2y? 4- aixiyi + a2y\ - 3x? - 1a2x\ - a4Xi -f aiXiyi 0 = . 2yi -f a^i + a3 Substitution for y\ from (3.23b) then gives (3.72b) in the tangent case. Now we can calculate P3 = (x3,ys) = Pi + P2. Let F(xyy) be the difference of the right side and the left side of the Weierstrass equation (3.23b). Along the line y = mx+b, this expression becomes P(x, mi+6), and it is clearly a monic cubic polynomial in x. Since Pi, P2, and PiP2 are on this line and are in E(k) and since (3.70) shows that x3 equals the x coordinate of P1P2, all of xi,x2ixs are roots of F(x,mx + 6) = 0, in chord case in tangent case (3.72a) in chord case in tangent case. (3.72b)
76 III: CUBIC CURVES IN WEIERSTRASS FORM with multiplicities counted. Thus F(x, mx + 6) — (x — x\)(x — Xi)(x — x3). Substitution of the expression for F gives x3 + «2^2 + ^4^ -f a6 — (mar + 6)2 — aix(mx + 6) — 03(7721 -f 6) = x3 — (xi + x2 + £3)z2 + lower order. We can equate the coefficients of x7 on the left and right, obtaining a2- m2 - aim = -(xi + x2 4- £3), and the result is 2 23 = m + a^ — a2 — X! — X2 2/3 = -(m + ai)^3 - b - a3 > in chord and tangent cases. (3.73) A special case of (3.73) is that P\ — i^j so that we are in the tangent case. If P — (x,y), then substitution of (3.72a) into (3.73) yields a formula for the x coordinate x(2P) of 2P: X(2P) = 4*3+ 62*2 +264*+ 66' (3J4) where 62,64,66,63 are as in (3.24). Example 1. Cubic case of Fermat's Last Theorem. This was discussed in Subsection B of §1. Since Fermat's Last Theorem is valid in degree 3, (3.3) has only trivial solutions. Transforming to (3.7), we see that „,2 _ 3 27 V = x - -4 has only (3,|), (3,-§), and 00 as its Q solutions. Thus the group of solutions must be isomorphic to Z3. Let us verify that P = (3,^) has order 3. We have ai = a3 = a2 = a4 = 0 and ae = — 2Z. Thus 62 = 64 = bs = 0 and &6 = —27. Hence (3.74) gives 81 + 162 x(2P)=*4 + Mx ,=3 108-27 = 3. 4x3 - 27 I So 2P = P or 2P = -P. The conclusion 2P = P is false because it forces P = O = 00. Thus IP = -P and 3P = O. In other words, P indeed does have order 3.
5. SINGULAR POINTS 77 EXAMPLE 2. y2 + y = i3-z over Q. This is one of the elliptic curves in Table 3.2, and it has A=37. The point P = (0,0) is an obvious solution, and we can calculate from (3.73) that P=(0,0) 5P = (i,-|) 2P = (1,0) 6P = (6,14) 3P = (-1,-1) 7P = (-f,£) 4P = (2,-3) 8P = (&-&)• (3.75) It will follow from Theorem 5.1a that P generates an infinite cyclic subgroup. (In fact, P generates all solutions.) 5. Singular Points Singular Weierstrass curves arise for certain primes p when a Weier- strass curve with integral coefficients is considered modulo p. Such curves are easy to analyze, and we shall note some of their features in this section. Let E be a singular Weierstrass curve over a field k. The point oo on the curve is nonsingular, and we have only to analyze points (x,y) in the affine plane. Proposition 3.10 will show that there is only one singularity and that, under a mild restriction on k, it occurs at a k rational point (xo,yo)- Translating (zo>yo) to the origin (an admissible change of variables), we are led to the projective curve (3.23a) with a6 = 0. The condition that — give 0 at (0,0,1) means that a4 = 0, ox and the condition that -r- give 0 at (0,0,1) means that az = 0. Thus oy ] E is given in affine form by the equation We can factor y2 + aixy = x3 + a2x2. (3.76) y2 +aixy - a2x2 (3.77) over Jb, obtaining (y - otz)(y - 0x) = x3 with a, f3 € k. (3.78) We say that the singular point (0,0) is a cusp if a = /?, or a node if a ^ p. Pictures of the two kinds of behavior (with ax = 0) appear in Figure 1.6.
78 III: CUBIC CURVES IN WEIERSTRASS FORM Proposition 3.10. For a singular Weierstrass curve E over Jfc, there is only one singular point (x0,yo)- It is k rational if either (i) char(ifc) ^ 2 or (ii) char(Jb) = 2 and k is closed under the operation of taking square roots (as is the case when k is a finite field of characteristic 2). The point (xo> 2/o) is a cusp if c4 = 0, or a node if c4 ^ 0. Proof. If char(ib) ^ 2, we can apply, without loss of generality, a projective transformation to eliminate the d\ and a^ terms. By Proposition 3.5 the curve will be singular if and only if the cubic polynomial / in x has a repeated root. In this case / and /' have a greatest common divisor g over k with degree > 1, and the singular points are (xo>0), where xq ranges over the roots of g. If g has degree 1, its unique root xo is in ky and (xo,0) is the unique singular point. If g has degree 2, its two roots x0 and x0 must be equal, since otherwise Xo and x'0 would both be roots of multiplicity > 2 for the cubic polynomial /. Since we can conclude xo = ^o, ^o is in k, and (xo,0) is the unique singular point. If char(fc) = 2, the singular points are the points (xo>2/o) on the affine curve over k satisfying (3.42a) and (3.42b). If ai ^ 0, (3.42b) uniquely determines xo, and (3.42a) then uniquely determines t/oj the resulting (x0,3fo) >s k rational. If c^ = 0, (3.42a) forces x2+4 = 0- In characteristic 2, square roots are unique, and the assumption (ii) says that xq is in k. Since (3.42b) shows a3 = 0, t/o is given by 2/0 = *0 + °2^o + a*x0 + «6 and again exists in k and is unique in k under our assumption (ii). Return to general k. Under the translation over k that moves (xo, j/o) to the origin, c4 is unaffected. Thus it is enough to decide cusp vs. node in (3.76). From (3.24) and (3.25), the value of c4 is c4 = b\ - 2464 = {a\ + Aa2f - 24(2a4 + Gla3) = {a\ + 4a2)2. (3.79) If char(Jfc) ^ 2, the discriminant of (3.77) is a\ + 4(*2, which is 0 if and only if c4 = 0; hence a = /? (and there is a cusp) if and only if c4 = 0. If char(ib) =. 2, then (y - ax)2 = y2 + axxy - a2x2 says a\ = 0 and a2 — a2; hence a = p (and there is a cusp) if and only if a\ = 0, which happens if and only if c4 = 0.
5. SINGULAR POINTS 79 We can parametrize our singular Weierstrass curve (3.76) by the method that Diophantus used for conies (see Chapter I): We parametrize lines through the singular point (0,0) by their slopes, and we find that most lines intersect the curve at one and only one other point. To handle exceptional lines, it is necessary to distinguish two subcases of nodes. Let us say that a node is a split case if the members a and /? of (3.78) lie in k. In the contrary case, a and /? lie in a nontrivial quadratic extension of k} and we say that the node is a nonsplit case. Proposition 3.11. For the singular Weierstrass equation E in (3.76), the map t —> (t2 + axt - a2, t{t2 + axt - a2)) carries k - {<*,/?} one-one onto E(k) — {oo} - {(0,0)}. If A: is a finite field with \k\ elements, the nonsingular set E(k) — {(0,0)} therefore has |Jb| — 1 elements if (0,0) is a split-case node |Jfc| + 1 elements if (0,0) is a nonsplit-case node |ik| elements if (0,0) is a cusp. Proof. In (3.76), x = 0 gives only y = 0, and (0,0) is singular. Thus any nonsingular point of E(k) — {oo} has y — tx for a unique member t of k. Substituting tx for y in (3.76) and using x / 0, we are led to t2 + a\t — a.2 = x. Then also y is t times this expression for x, and k maps onto the affine solutions of (3.76). To exclude x — 0 from the image, we must exclude the roots of i2 -+■ a^t — a2\ these are a and /?. Once x — 0 is not in the image, the map is one-one since t is recovered as y/x. The numerology if A: is a finite field is then clear.
CHAPTER IV MORDELL'S THEOREM 1. Descent MordelPs Theorem (Theorem 4.11 below) is the statement that the group E(Q) is finitely generated if E is an elliptic curve ove/ Q. The proof is motivated by a close examination of Format's method of descent. To prove that x4 -f y4 = z4 has no nontrivial integer solutions, Fermat showed that a nontrivial integer solution of u4 + v4 = w2 leads to a strictly smaller nontrivial integer solution of the same equation, and he thereby arrived at a contradiction. In this section we shall look at Fermat's proof in detail, and then we shall translate it into an argument about the elliptic curve y2 = x3 — Ax via the transformations of Subsection D of §111.1. It will turn out that the descent procedure passes from the point P to a point \P (or possibly — \P) on the elliptic curve. With MordelPs Theorem in hand, we know that the passage P —► ^P cannot go on forever. But, in fact, the argument can be turned around to give a proof of MordelPs Theorem. The passage P —► \P has certain obstructions, which are measured by E(Q)/2#(Q). (In Fermat's proof these obstructions correspond to an occasional need to adjust the parameters in apparently trivial ways.) The hard step in the proof of MordelPs Theorem will be to prove that Z?(Q)/22?(Q) is finite, so that the descent passage P —* \P can proceed for all but a finite number of situations. The other step in the proof is to codify the notion of a "smaller" solution of it4 + v4 = w2y or of a "simpler" solution of y2 = x3 — Ax. This is done by introducing a suitable notion of height such that only finitely many points on the elliptic curve have height less than any given constant C. For the notion of height h(P) that we introduce, we shall have h(^P) = \h(P), so that the descent process P —► \P eventually carries us to height less than C. If C is taken larger than the maximum of h{P) for P in a set of coset representatives for the finite group £(Q)/2i?(Q), then the finite set of points of height less than C will turn out to be a finite.set of generators for £(Q). Let us begin with Fermat's argument. We use the symbol Q to denote a square. 80
1. DESCENT 81 Proposition 4.1 (Fermat). u4 + v4 = w2 has no nontrivial integer solutions. Proof. Arguing by contradiction, suppose we have a nontrivial integer solution. Without loss of generality, we may assume GCD(u, v) = 1, it is odd, and v is even. Writing and applying Theorem 1.2, we obtain integers m and n with u2 = m2-n2, v2 = 2mn, GCD(m,n) = l. Applying Theorem 1.2 to m2 = u2 + n2, in which w is odd, we obtain integers p and q with rn = p2 + ?2, ti = p2-g2, n = 2pqy GCD(pyq) = 1. Since t; is even, we can write (£) =D = 2mn = w(p2 + *2)- Here p, 9, and p2 -f q2 are relatively prime since GCD(p, q) = 1. Thus p=D. ?=D. p2 + 92 = D- and we can write p=r2, <? = s2, r4+j4 = Q Thus we have been led from the old solution u4 + v4 = Q to a new solution r4 + s4 = QJ. These are related by (!)2=P?(p2 + ?2) = r2SV + A hence by v = 2rs\/r4 + s4. (4.1) If the new solution were trivial, then rs would be 0, so that v would be 0; this would mean that the old solution was trivial. We conclude that rs ^ 0, and then it is clear from (4.1) that r < v and s < v. Hence max(|r|, \s\) < max(|u|, |t>|). The passage from an old solution to a new solution therefore cannot go on forever, and we arrive at a contradiction.
82 IV: MORDELL'S THEOREM Analysis of proof. Let us go backwards. We have (r,*) — (p,9) = (rV) — (m,n) = (p2 + q2,2pq) = (r4 + s4,2r2s2) (u,v) = (p2 - q2,v) = (r4 - s4,2rs\/r4 + s4). (4.2) Now let us match parameters with what is happening for the corresponding elliptic curve. Schematically we have (x,y) on cubic ► (u}v) on quartic i (c,d) on cubic < (r»5) on quartic The equations that are satisfied by these variables are (4.3) y2 = x3 - Ax d2 = c3- 4c U4 + V4 = W2 r4+ *4 = Q We seek an expression for x in terms of c. Let X = u/v and Y = w/v2, so that y2 = X4 + 1. Then (3.14) says that the top line of (4.3) is implemented by 2x y = t/2 + 8g 4x2 J > and < x — Y-X2 AX Y-X2' Let R — r/s, so that ^=rdy=c3-4c"=c2- V2c,/ 4c2 4c (4.4) Then Y-X' = _ w - u2 _ (m2 + n2) - (m2 - n2) 2rV 2mn 2R2 m r* + s4 R4 + ly
1. DESCENT 83 and hence Y-X2 #> /c2-4\ 4c(c2-4)' l ^ Interpretation of analysis. In the notation of (3.23) and (3.24), our elliptic curve has a\ = as = a.2 — a$ = 0 and 04 = —4. Thus 62 = £6 = 0, 64 = -8, 68 = -16. The point (c, d) is on the elliptic curve, and thus c = x(P) for a point P. Then (3.74) says that and this expression coincides with the expression (4.5) for x. Computation shows also that y(2P) agrees with y except for a factor of sgn c. Thus Fermat's method of descent amounts to a construction that starts with P' in £(Q) and constructs ±\P* in E(Q). Closer inspection shows that we made certain adjustments to our variables as we went along, in order to be able to carry out the descent. The first one was to assume that u is odd and v is even, rather than the other way around. This is the first adjustment. The next assumption was that u and v are positive; if u and v are not positive, we adjusted them to make them positive. We chose r and 5 to be the positive square roots of p and qy but we could not force r to be odd and 5 to be even. Normalization of r and 5 to be of the same form as u and v was a second instance of the first adjustment. In the context of the elliptic curve, it turns out that both adjustments amount to changing the point we are studying by a coset representative of £,(Q)/2E(Q). The fact that at most two adjustments are needed reflects the fact that E(Q)/2E(fy) is a sum of only two copies of Z2 in this example. Finally we can ask why the descent has to stop. For the quartic equation this was clear, since the integer entries of the solution were going down. Let us translate this condition into the context of the elliptic curve. For a rational number j in lowest terms, we define = logmax(|i|l|j|).
84 IV: MORDELL'S THEOREM Since r and s in (4.2) are relatively prime, we have = logmax(|r|,|5|). The construction of r and 5 had the property that \R)00 = \-\ <|-| =1X1^. I 5 loo llMoo With the quartic equation, the descent has to stop because only finitely many rationals have | • |oo less than a given constant. To work directly with the elliptic curve, we first re-examine (4.1) and see that in fact I 1 V loo I Now R is given in terms of c by (4.4), and a little checking shows that ||fl|co-|cU<M (4.7a) for some computable constant M. Similarly |Moo-|*|oo|<M. (4.7b) Putting together (4.6) and (4.7), we obtain \c\~<\\*U + \m. (4.8) Thus Moo < Moo as long as ZM < \x\w. (4.9) The values of |c|oo and \x\oo are a naive version of the notion of height mentioned at the beginning of the chapter. Condition (4.9) does not say that the descent must stop, but it does say that it has to lead to a (nontrivial) solution of the elliptic curve with small numerator and denominator. Although this conclusion does not rule out all nontrivial solutions for this example immediately, we shall see that it does give enough information for the conclusion that £(Q) is finitely generated. To get a cleaner argument, we ultimately will use a more refined definition of height, and inequality (4.8) will not have the additive constant on the right side.
2. CONDITION FOR DIVISIBILITY BY 2 85 2. Condition for Divisibility by 2 The proof of Mordell's Theorem will involve showing that £(Q)/2£(Q) is finite and will involve using a suitable notion of height to prove that the descent must stop. In this section we prove a theorem that allows us to recognize elements of 2£(Q). We continue to use the symbol Q to denote a square. Theorem 4.2. Let E be an elliptic curve over a field of characteristic not equal to 2 or 3. Suppose E is given by y2 = (x — a)(x — (3){x — 7) = x3 -f rx2 -f sx -M with or,/?,7 in k. For (2:2,2/2) in E(k)} there exists (£i,yi) in E(k) with 2(^1,2/i) = (#2)2/2) ^ an(J only ^ #2 — <*> #2 — /?> and x2 - 7 are squares in fc. Proof of necessity. If (£i,yi) exists, let y = mz+6 be the tangent line at x\. Since (£i,yi) and (£2,— 2/2) both satisfy (x - a)(z - /?)(x - 7) = y2 = (rnx -f &)2 and since y = mi 4- 6 is actually tangent at x\y the three roots of x (x - a)(x - 0){x - 7) - (mi -I- 6)2 are £2, x\, and £1. Thus (x - a)(x - /?)(x - 7) - {mx + 6)2 = (x - x2)(x - xx)2. Putting x = a, we have — QJ = (a — ^2)[]]- Since Xi cannot be or, the square on the right is not 0. We conclude £2 — a is a square. Similarly £2 — 0 and £2—7 are squares. Preparation for sufficiency. Reduction of quartics to cubics. If we have a monic quartic polynomial equation in one variable to solve, we can translate x to eliminate the x3 term. Then we have £4 + ax2 + bx -f c = 0,
86 IV: MORDELL'S THEOREM With u as an unknown to be specified, (4.10) is equivalent with (x2 + u)2 = (2u - a)x2 - bx + (u2 - c). (4.11) We choose u so that the right side of (4.11) has a double root in x. Then 0 = B2 - AAC = b2- 4(2u - a)(u2 - c) = b2 - 8u3 + 4at/2 + 8cu - 4ac, and hence 8u3 - Aau2 - 8cu + (4ac - 62) = 0. If we solve for u and substitute into (4.11), the result is Q = QJ. We take square roots and are led to a quadratic equation for x. Proof of sufficiency. Changing variables in x, we may assume X2 = 0. We are given y2 = (x - a)(x - (3)(x - 7) = x3 + rx2 + sx + t with t/2 = -a/?7 = * (4.12a) and with —a = a2, —/? = /?2, —7 — 72, and we are to produce (xi,j/i). Without loss of generality, we may adjust the signs of <*i,/?i,7i so that *i0i7i = !fe. (4.12b) Take 3/ = mx + 2/2 as the line through (0,2/2) tangent at the unknown point (xi,t/i). Then the three roots of (x - a)(z - P){x - 7) - (mx + y2)2 (4.13) are 0, x1? and xj. So ~(x3 + rx2 -f sx — m2x2 — 2my2x) is to be (x — xi)2. Thus the discriminant of x2 4- (r - m2)x + (s - 2my2) (414) is to be 0: (r-m2)2 = 4(s-2rm/2). (4.15)
2. CONDITION FOR DIVISIBILITY BY 2 87 This is a quartic equation in m. If it has a root m0 in Jfc, then x\ = \{m\ — r) is a double root of (4.14) and hence also of (4.13). Consequently 2(xi,ra0xi + y2) = (0,-y2), and 2(xi,-m0xi - y2) = (0,y2). In other words (x\yy\) exists with 2(xi,yi) = (z2,y2). Thus it is enough to show that (4.15) has a root in k. As in the preparation for this part of the proof, let u be an unknown to be specified. From (4.15) we have (m2 -r + u)2 = 2um2 - 8y2m + (u2 - 2ru + 4s). (4.16) We try to find u such that the right side is a square as a function of m. Thus 0 = B2 - \AC = 64y| - $u(u2 - 2ru + 4s) u3 - 2ru2 + 4su - 8y2 = 0. The roots of this equation are it = —2a, —2/?, —2y. [In fact, just change variables with u = —2vy use the identity (4.12a), and reduce the polynomial to -8(u3 + rv2 + sv + t)] Let us take u - -2oc. Then (4.16) gives (m2 - r - 2a)2 = -4am2 - Sy2m + (4a2 + 4ra + 4s). Substituting for r and s in terms of a,/?,7 and using (4.12b), we obtain (m2 - a + p + y)2 = 4(ttim - /?i7i)2, m2-a + /? + 7 = ±2(arim - ft 71), m2 =F 2aim - a = -/? - 7 =F 2ft71, (m T «02 = ft2 =p 2ft71 + 7? = (ft T Ti)2- Taking square roots and denoting an independent sign by ±', we find m = ±ati±'(Pl T7i). We can reverse the steps, and we see that we have found four roots m in k to the equation (4.15). This proves the theorem. Let us digress for a moment and return to the subject of congruent numbers, first discussed in Subsection C of §111.1. Recall that n is congruent if n is the area of a rational right triangle. Using the easy half of Theorem 4.2, we obtain the following corollary, which says that (3) => (2) in Proposition 3.1.
88 IV: MORDELL'S THEOREM Corollary 4.3. Let n be a square-free positive integer, and suppose there exists a Q rational point (x, y) on 2 3 2 y = x — n x other than (—n,0), (0,0), (n,0), and oo. Then there exist three rational squares in arithmetic progression with common difference n. Hence n is congruent. Proof. Let P = {x\) y\) be a Q solution other than the trivial ones. Then y\ / 0 and P does not have order 1 or 2. Hence 2P is not oo, say 2P = (x2,J/2)« Then the theorem says that a^ + n, #2i a^d x2 — n are all squares, as required. By (2) => (1) in Proposition 3.1, n is congruent. Corollary 4.4 (Fermat). n = 2 is not a congruent number. Proof. If 2 were congruent, (1) => (3) in Proposition 3.1 would say that y2 = x3 — 4x has a nontrivial solution. Transforming via (3.14), we would obtain a nontrivial solution of the equation u4 -f v4 = w2. But this conclusion contradicts Proposition 4.1. 3. £(Q)/2£(Q), Special Case The main step in Mordell's Theorem is the proof of the following result. Theorem 4.5. If E is an elliptic curve over Q, then the abelian group £(Q)/2£(Q) is finite. Using an admissible change of variables, we may assume that E is of the form y2 = (x - a)(x - 0)(x - 7) (4.17) with integer coefficients. In this section we make the additional assumption that or,/?,y are in Q, hence in Z. Since or,/?, y are in Q, the subgroup of £(Q) of elements of order < 2 is of the form Z2 0 Z2. The picture of E(R) is qualitatively the same as in Figure 1.7c. We consider the group Qx/Qx2, writing it as - QVQx2={i2a365c7d...|a,6,c,c/,...€{0,l}}= £ 0Z2. ±,2,3,5,7,... (4.18) We continue to use the symbol Q] to denote a square.
3. E(Q)/2£(Q), SPECIAL CASE 89 Proposition 4.6. Let E be the elliptic curve (4.17) over Q with integer roots, and define (p : E(Q) —► Qx/Qx2 by {(x_a)Qx2 if p_ ^yj wiih p^ooand i^a (a-/?)(<*-7)Qx2 ifP = (a,0) Qx2 ifP = oo. (4.19) Then cp is a group homomorphism. Remark. Then <p(2P) = <p(P)2 € 1 • Qx2, so that <p descends to a homomorphism Va : E(q)/2E(Q) — QX/QX2. (4.20) Proof. Let Pi + P2 = P3. We are to show (p(Pl)<p(P2)(p(P3)~l is in Qx2. Since <p(P3) = y?(-P3) and <p(P3) = <£>(P3)_1, it is enough to show that Pi + P2 + P3 = 0 implies ^(PiM^Mft) is in Qx2. If one of the P, is oo, this conclusion is trivial. Thus we may assume P, is of the form (x,-, y,) for i = 1,2, 3. Case 1: No (xf-,t/,) is (a,0). Let y = mi -f & be the line on which Pi,P2,P3 lie. Each P, = (x,-,j/i) satisfies (x — a)(x - /?)(x - 7) = y2 = (mx -1- fc)2. Hence (x — a)(x — /?)(x — 7) — (mx -f &)2 is 0 for x — x\, x2, X3 with multiplicities. Thus (x - a)(x - /?)(x — 7) — (mx + b)2 — (x - x\)(x - x2)(x - X3). Putting x — oc gives (xi — a)(x2 — a)(x3 — a) = (ma 4- 6)2, and thus <£>(Pi)<£>(P2)v?(P3) is a square in Qx. Case 2: (zi,j/i) = (<*,0). Then neither (x2,y2) nor (x3,y3) is (a,0), since otherwise the other point would be 00. Let y = mx + 6 be the line on which Pi,P2,P3 He. As before, (x — a)(x — (3)(x - 7) — (mx -f b)2 is 0 for x = a,x2,x3 with multiplicities, and thus (x — a)(x — f$)(x — 7) — (mx + 6)2 = (x - a)(x — x2)(x - x3). (4.21) Now x — a has to divide (mx + 6)2, and hence mx + 6 = m(x — a). So (4.21) becomes (x — a)(z — /?)(x — 7) — m2(x — a)2 = (x — a)(x — x2)(x — x3) or (x - /?)(x - 7) — m2(x - a) = (x - x2)(x - x3). Putting x = a gives (a - /?)(<* - 7) = (a - x2)(a - x3), so that <p(Px) = <p(P2)<p(P3)- Thus (p(Pl)ip(P2)ip(P3) is in Qx2.
90 IV: MORDELL'S THEOREM Corollary 4.7. Let E be the elliptic curve (4.17) over Q with integer roots, and define y>a : £(Q)/2£(Q) — Q*/Qx2 as in (4.19). Define <pp similarly. Then the homomorphism <pax<Pp: E(Q)/2E{Q) — Qx/Qx2 * Qx/Qx2 (4-22) is one-one. Proof. Suppose (x,y) € £(Q) maps to Qx2 under <pa and (pp. Case 1: Suppose (x, y) is not oo or (a, 0) or (0,0). Then the condition is that x — a and x —/? are QJ. Since (x —a)(x — ($)(x — 7) = [""[, x — 7 is □ • By Theorem 4.2, (x,y) = 2(x',y') for some (x',y') in E(Q). Thus (x,y) is in 2£(Q). Case 2: Suppose (x,y) = (a,0). Then □ = ^(a.0) = (a-/?)(<*-7)Q*2 □ =¥»^a,0) = (a-/?)Qx2. Then a — ft and a — 7 are QJ. Also 0 is a square. By Theorem 4.2, (a,0) = 2(x',y') for some (1 ,y'), and thus (a,0) is in 2E(Q). Case 3: Suppose (x,y) = (/?,0). This case follows from Case 2 by symmetry. Let us write qx/qxa x QX/QX2 = J2 ®(Z2eZ2). (4.23) ±,2,3,... It will turn out that the image of <pQ x <pp in Qx/Qx2 x Qx/Qx2 is 0 in almost all coordinates of (4.23). Let d be the discriminant of the cubic polynomial (x — a)(x — /?)(x — 7), namely d=(a-p)2(a-7)2(/3-y)\ (4.24) (See Proposition 3.3 and the neighboring discussion.) If p is a prime and r is rational, we write pa\\r if r = pa<7 and q € Q has no factor of p in its numerator or denominator.
3. E(Q)/2E(Q), SPECIAL CASE 91 Proposition 4.8. Let E be the elliptic curve (4.17) over Q with integer roots, let <pQ x <pp be the homomorphism (4.22), and let d be the discriminant (4.24). Then the image of <pQ x (fp in (4.23) is confined to the coordinates i and primes p dividing d. Thus the homomorphism £?(Q)/2E(Q) —♦ £ 0(Z2 0 Z2) (4.25) ±,p\d is one-one. Remark. The group on the right of (4.25) is finite, and thus Proposition 4.28 completes the proof of Theorem 4.5 under the special assumption that the roots or,/?,7 in (4.17) are in Q. Proof. Let (x,y) be a point other than 00 in E(Q). Assume for the moment that x £ {a, /?, 7}. Fix a prime p— 2,3,5,..., and define integers a,b}c by pa\\(x-a), P4||(x-/?), pc\\(x-y). Since (4.17) says that (x — a)(x — fi){x — 7) is a square, we have a + 6 + c = 0 mod 2. (4.26) Suppose at least one of a, 6, c is < 0. Say a < 0. Since a is an integer, p'a' || (denominator of x). Therefore pa\\(x-a), p° || (*-/?), pa||(x-7). In other words, a = b •= c. From (4.26), it then follows that a = b = c = 0 mod 2. Consequently the image of (x,y) in the pth coordinate of (4.23) is 0. Suppose at least one of aybfc is > 0. Say a > 0. If p \ </, then p f (a — P) and hence p cannot occur in the numerator of x- /3= (x-a) + (a-/?). In other words, 6 = 0. Similarly, if p f rf, then p \ (a — 7), and we conclude c = 0. By (4.26), a is even. Thus again we have a = b = c = 0 mod 2, and again the image of (z,y) in the pth coordinate of (4.23) is 0. We have been assuming that x £ {<*,/?,7}. When x is in this set, then <pQ{x) and <pp(x) are products of a — /?, a — 7, and /? — 7, up to sign, and these are all prime to p if p \ d. Hence the pth coordinate of <pa(x) x <pp(x) is (0,0). Combining these conclusions with the one-oneness from Corollary 4.7, we obtain the proposition.
92 IV: MORDELL'S THEOREM 4. £(Q)/2£(Q), General Case So far, we have proved Theorem 4.5 in the special case that the roots a,/?,7 in (4.17) are in Q. In the general case, let k be the splitting field of the cubic polynomial (x — a)(x — fi)(x — 7) over Q. The rough idea is to imitate the proof in §3, replacing Q everywhere by this extension field k. But we encounter three complications: (1) The modified approach will lead to information about E(k)/2E(k), and we need to deduce information about £(Q)/2£(Q). (2) The decomposition (4.18) of Qx/Qx2 uses unique factorization in Z, and unique factorization may fail in the ring of algebraic integers in k. (3) The ± coordinate of Q*/Qx2 in (4.18) represents units in Z, and the units need to be controlled in whatever ring replaces Z in the general case. We can dispose of (1) rather quickly. Proposition 4.9. Let E be the elliptic curve (4.17) over Q, and let k be the splitting field of (x — a)(x — (3)(x — 7) over Q. Then the canonical homomorphism £(Q)/2£(Q) — E(k)/2E(k) (4.27) has < 22tfc:Ql elements in its kernel. Consequently if E(k)/2E(k) is finite, then so is E(Q)/2£(Q). Remark. We shall use the Galois group Gal(A:/Q). Note that if Q = (xyy) is in E(k) and cr is in Gal(fc/Q), then Qa = (a(x)}(r(y)) is in E(k) since E is defined over Q. Proof. Let E[2] = {Q' £E(k)\'2Q' = 0}. The points P in #(Q) that map to 0 under the canonical homomorphism are those in E(Q>) C\ 2E(k). For each such P, select Qp in E(k) such that 2Qp = P. Then we obtain a function XP : Gal(Jfc/Q) —> E[2] (4.28a) by the definition \p{a) = Qap-QP. (4.28b)
4. E(Q)/2E(Q), GENERAL CASE 93 [In fact, to see that Xp takes its values in E[2], we observe that 2(Q(p - QP) = {2QPy -2Qp = P°-P = 0.] Now let us show that AP = XP> implies P' is in P-f 2£(Q). In fact, if Qp — Qp = Xp = Xpi = Qapi — Qpi for all <r, then [Qp'-QpY = Qp>-Qp. Since fc is a normal extension of Q, fcGal(*/Q) = Q. Thus Qp> — Qp is in £(Q). Hence P'-P = 2(Qp<-Qp)£2E(Q)) as required. Thus for each member of the kernel of (4.27), we select P in #(Q) mapping into 2E(k)y and then we can associate Xp as in (4.28). What we showed above is that if we start with a different member of the kernel, then the resulting Xp is a different function from Gal(Jfc/Q) to E[2]. Hence the order of the kernel is < the number of functions from Cal(*,Q) to E[2]t which is 4lGa,<*'«l = 22I*:Q1. Complications (2) and (3) are more subtle. Let us list some examples. Example 1. y2 = x3 + x. The splitting field is k = Q(v/:rT)- The ring of integers l\y/— 1] is a unique factorization domain, and its group of units is {(\/IIT)A:}2=o-Z4- Example 2. y2 - x3 - 2x. The splitting field is k - Q(>/2). The ring of integers Z[>/2] is a unique factorization domain, and its group of units is the infinite group {±(1 ± V^)*}^.^ ^ Z 0 Z2. Example 3. y2 - x3 + 5x. The splitting field is k = Q(v/Z5). The ring of integers Z[«s/^5] is not a unique factorization domain. In fact, the numbers 2, 3, l±>/--5 are all prime, and 2-3 = (l + \/—5)(1 —\/—5)- The group of units is {±1} = Z2. In the first two examples, Z[>/=T| and Z[\/2] are unique factorization domains. So it is meaningful to write kx/kx2 = {units/(squares of units)}© {piPb2-- |a, 6, • • • G {0,1}} = {units/(squares of units)} © Y^ Z2; (4.29) p prime
94 IV: MORDELL'S THEOREM in the sum we list only one representative from each class {pc\e = unit} of associate primes. As in (4.19), we get a homomor- phism (p : E(k) —► kx/kx2 generically given by x —► (x — a)kx2. And as in Corollary 4.7, we obtain a one-one homomorphism <pQ*<Pp • E(k)/2E(k) —► {units/(squares of units)} 0 ^ (Z2 0 Z2). p prime (4-30) As in Proposition 4.8, we can discard all primes p except those dividing the integer discriminant d given in (4.24). There are only finitely many such primes because of unique factorization. Then E(k)/2E(k) has been mapped one-one into the finite group {units/(squares of units)} 0 ^(Z2 0 Z2), (4-31) and E(k)/2E(k) is therefore finite. For the third example, unique factorization fails for Z[^/-5], and (4.29) is no longer valid. However, if we let S = {1, 2, 22, 23,...} and define R = 5~1Z[>/—5], then R is a ring with Z[«v/--5] C R C ib, and it turns out that R has the following two properties: (1) R is a principal idea domain, hence a unique factorization domain. (2) The group of units in R is finitely generated (with —1 and ^ as generators). Since R contains Z[\/^5], the quotient fiek^6f R is still k. By unique factorization, (4.29) is valid if we interpret units and primes as units and primes in R. The argument yielding a one-one homomorphism (4.30) is then valid, and again we can collapse the image to (4.31). Since.jbhe group of units in R is finitely generated, it is a direct sum of cyclic groups, and thus {units/(squares of units)} is a finite group. Therefore (4.31) is a finite group, and E(k)/2E{k) has to be finite. The same argument as above reduces the finiteness of E(k)/2E(k) in the general case to the following result. Theorem 4.10. Let k be a finite extension of Q, and let Ok be the ring of algebraic integers in k. Then there exists a ring R with Ok C RCk such that (1) R is a principal ideal domain, hence a unique factorization domain. (2) The group of units in R is finitely generated.
5. HEIGHT AND MORDELL'S THEOREM 95 The proof of Theorem 4.10 relies on three famous results in algebraic number theory. In §9 we shall sketch the relevant background from algebraic number theory and then give a proof of this theorem. As we mentioned, the finiteness of E(k)/2E(k) follows from Theorem 4.10. Then Proposition 4.9 implies that £(Q)/2£(Q) is finite. This completes the proof of Theorem 4.5. 5. Height and Mordell's Theorem With the hard step finished, we can now address MordelPs Theorem. In this section we state the theorem, introduce the notions of "naive height" and "canonical height," and show how Mordell's Theorem follows from Theorem 4.5. Theorem 4.11 (Mordell). If E is an elliptic curve over Q, then the abelian group £(Q) is finitely generated. Using an admissible change of variables, we may assume that E is of the form y2 = x3 + Ax + B (4.32) with A and B in Z. For P = (xyy) in i?(Q), write x = - with GCD(p, q) — 1, and define the naive height of P by /i0(P) = logmax(|p|,M)>0. (4.33) By convention we define Ao(oo) = 0. For any given C, {P £ F(Q) | k0{P) <C} is a finite set. (4.34) Proposition 4.12. The naive height satisfies ho(2P) = 4h0{P) + 0(l), (4.35) where 0(1) is bounded independently of P £ E(Q).
96 IV: MORDELL'S THEOREM Proof. If P* = IP = (x*,y*) ^ oo, we know from (3.74) that * _ x4 - 64x2 - 266x - 68 ~~ 4x3 + 62x2 + 264x -f 66 ' where b2 = 0, 64 = 2A, 66 = 4fl, 68 = -A2. Thus „ _ x4 - 2Ax2 - 8Bx + A2 X ~~ 4(x3 + Ax + B) If x = - with GCD(p, g) = 1, then x* = — with q q* p* =p4- 2Ap2q2 - SBpq3 + ;4 V q* = 4q(P3 + Apq2 + Bq3). Let 6 = GCD(p*,g*), p** = p*/6, and q** = q*/6y so that x* = -^ in lowest terms. Then max(|p**|,|,"|) <max(|p'|,|g*|) < max(|p|, |9|)4 max(l + 2|>1| + 8|5| + A2, 4(1 + \A\ + \B\)) = Ca,b max(|p|,|g|)4, and h0(2P) < Ah0(P) + log CA,B ■ (4,36) To get an inequality in the reverse direction, we shall bound max(|p|, \q\) in terms of max(|p*|, \q"\), and we shall bound max(|p*U?*|) in terms of max(|p**|, |g**|). Let d = -AA3 - 27B2 #0 be the discriminant of the cubic polynomial x3 + AX + B. Then one can check the following identities: Adq7 = (3p3 - hApq2 - 27Bq3)q* - 4(3p2g + 4.493)p* Adp7 = -(A2Bp3 + (5A4 + Z2AB2)p2q + (26A3B + l92B3)pq2 - 3{A5 + $A2B2)q3)q* - A({AA3 + 27B2)p3 - A2Bp2q + (ZAA + 22AB2)pq2 + Z{A3B + 853)</3)p*. * (4.37)
5. HEIGHT AND MORDELL'S THEOREM 97 [To derive these formulas, one can use two applications of Proposition 2.16 and its proof, with f = p* and g = q*. In one application, X = p and the domain is k[q]; in the other application, X = q and the domain is k\p]. A symbolic manipulation program is a helpful tool.] From (4.37), we obtain max(|p|, |,|)7 < C'A<B max(|p|, |?|)3 max(|p*|, \q'\) and thus max(|p|, |?|)4 < C'AB max(|p*|, |?*|). (4.38) This inequality is one of the desired bounds. For the other one, we see from (4.37) that 6\4dq7 and 6\4dp7. Since GCD{p,q) = 1, 6\4d. Hence 6 is bounded, and max(|p'|,|g'|)<C"max(|p"|,|g**|). Combining this inequality with (4.38), we have m«(|p|, |?|)4 < CAiBC"naX(\pt'\,\9*t\)- (4-39) Inequalities (4.36) and (4.39) together prove the proposition. Proposition 4.13. There exists a unique function h : i?(Q) —► R satisfying (i) h(P) - h0(P) is bounded (ii) h(2P) = Ah{P). The function is given by A(P) = lim M?^P) (4 4Q) It has h(P) > 0 with equality if and only if P has finite order. Also {P | h(P) < C) is a finite set for each C. Remark. The function h is called the canonical height. PROOF. We begin with uniqueness. Let h satisfy (i) and (ii), with a bound C" in (i). Then \Anh(P) - ft0(2nP)| = \h(2nP) - h0(2nP)\ < C by (ii) and (i). Thus
98 IV: MORDELL'S THEOREM and h must be given by (4.40). {h (2n P) 1 — > is Cauchy. By Proposition 4.12, we can write \h0(2Q)-4h0(Q)\<C" with C" independent of Q. Then N > M > 0 implies \4-Nh0(2NP) - 4-Mh0(2MP)\ N-l ^2 (4-n-1/io(2n+1P)-4-n/i0(2nP)) n-M 7V-1 < J2 4-n-1M2n+1P)-4V2nP)| N-\ < J2 4"n-1C" < (4.41) N~l 4_AfC" and the right side tends to 0 as M and N tend to oo. Thus h{P) is defined. Letting N —► oo in (4.41) and taking M = 0 gives \h(P)-ho(P)\<^-, which proves (i). Result (ii) is clear from (4.40). Also h(P) > 0. If P is a torsion point, then 2nP lies in a finite set. So h(P) = 0. In addition, (i) implies {P\h(P) < C} is finite. Now suppose P has infinite order. Since {P\h(P) < 1} is finite, we must have h(2nP) > 1 for some n. By (ii), h(P) > 4"n > 0. Proposition 4.14. The canonical height on E(Q) satisfies h(P + Q) + h(P - Q) = 2/i(P) + 2/j(Q). (4.42) Proof.We shall prove that h(P + Q) + ft(P - 0) < 2/i(P) + 2h(Q). (4.43) Once we have done so, we can apply (4.43) to P' — P+Q and Q' = P—Q to get h{2P) 4- h(2Q) < 2h{P + Q) + 2/i(P - Q).
5. HEIGHT AND MORDELL'S THEOREM 99 Dividing by 2 and using property (ii) of h, we obtain 2/i(P) + 2h(Q) < h(P + Q) + h(P - Q). In combination with (4.43), this inequality proves (4.42). To prove (4.43), it is enough, by (4.40), to prove that h0(P + Q) + MP -Q)< 2MP) + 2/i0(Q) + O(l). If P or Q is oo, this inequality is trivial; if P + Q or P — Q is oo, the inequality reduces to Proposition 4.12. Thus let us write P = (x,y) with x = — in lowest terms, Q = (x',y') with x = — in lowest terms, Woo = max(H, \q\)y |*;I*, = maxflp'l, |<?'|), x± = x(P ± Q). Since P / ±Q, (3.72) and (3.73) in the chord case give x± = x(P±Q) = /V=FyV Vx'-x/ Thus y'2 + y2-(x' + x)(x'-x)2 x+ + x- = 2 = 2 (*' - x)2 _ n (x/3 + x3) + A(x' + X) + 2B- (a1 + x)(x/2 - 2xx' + x2) (x' - x)2 _ xx'(x' + x) + A(x' + x) + 2ff (x' - x)2 = 2 PPV? + Pq') + Agq'jp'q + pg') + 2g?y2 ^ {p'q-pq'Y
100 IV: MORDELL'S THEOREM and _ jz'- x)2{x" + xz'+ z2 + A)2 (x' - z)4 2(x + x')[(*' + x)(x/2 - xx' + x2 + i4) + 2ff] (x' - x)2 _ (x/2 4- zz' + x2 + A)7 - 2(x/2 + 2xxf -f x2)(g/2 - xx' + x2 + A) (x' - x)2 -AB(x + x') + (x'2 - x2)2 ,2 +(x + x')2 + (x' — xy V _ A\2 _AR(-rJL t'\ after simplification (x'-x)2 (xx' - A)2 - 4B(x + x') (4.45) (x' - x)2 ^{pp' -qq'A)2 -Aqq'Bjpq' + y'q) (p'q - Vq'Y Examining (4.44) and (4.45), we see that x+ + x_ = - and x+x_ = -• with m*x(\T\,\,\,\l\)<C\x\l\x'&. (4.46) Then x+ and x_ are the two roots of X2 — (-J X + f-J =0, namely X = —-(r ± y/v2 — 4st), and we see that X+ € 2t and *~ € 2t * ^4'47^ Let x+ = I*- and x_ = £=- in lowest terms. By (4.47), 2t = S+q+ = £_</_ for suitable integers 6+ and 6-. Then (M-)(?+9-) = 4<2. (4.48) From 7 = x+ + x_ = h — = , we obtain 4r*2 (p+tf- +P-?+)^ = W_ = 5x6_
and hence From we obtain and hence 5. HEIGHT AND MORDELL'S THEOREM 101 M-(P+?- + P-?+) = 4r<. (4.49) 5 -„ . _P±Etl T - X+X- - : t q+q- 4st2 (p+p-)< = sq+q- = M- («+p+)(««P-) = 4^. (4.50) We shall show that <|£+6_, and then |p+?_ + p_g+| < 4|r| from (4.49) |p+p_| < 4|s| from (4.50) (4.51) ltf+0-1 <4|t| from (4.48) To see that <|<5+£_, first fix a prime p. We construct by (4.50) integers a and 6 > 0 such that pa|6+p+, p6|£_p_, and pa+b is the exact power of p dividing t. Since 2t — 6+q+ = 6-q-r pa\6+q+ and p6|6_<j_. Then p° dividesGCD(£+p+,6+<j+) = £+, andp6 dividesGCD(£_p_, £_</_) = £_. So pa+6|£+£_. Since p is arbitrary, <|6+£_. This proves (4.51). We readily check the numerical inequality max(|p+|,|g+|)max(|p_|,|g_|) < 2max(|p+g_ + p_?+|, |p+P-|, lfl+tf-1)- Substituting in the right side from (4.51) and then using (4.46), we obtain max(|P+|,|g+|)max(|p_|,|?_|) < 8max(|r|, \s\,\t\) < 8C*|*£>U- Consequently ho{P + Q) + h0(P -Q)< 2h0(P) + 2h0(Q) + log8C, and the proposition follows.
102 IV: MORDELL'S THEOREM Proof of Theorem 4.11. Since Theorem 4.5 has shown E(Q)/1E(Q) to be finite, there exists some sufficiently large C such that the set S = {PeE(Q)\h(P)<C) contains a representative for each coset of E(fy)/2E(Q). The set S is finite. We claim it generates E(Q). Assume the contrary, and let P E E(Q) be outside the group generated by S. Since {Pf G £(Q) \h(P') < C'} is finite, we may assume P is chosen with h(P) as small as possible among all points outside the group generated by S. Choose Q G S with P = Q mod2£(Q), say P = Q + 2R. By Proposition 4.14, we have either KP + Q)<h(P) + h{Q) or h{P -Q)< h(P) + h(Q)i fix the sign of h(P ± Q) so that this happens. From P + Q = 2(Q + R) and P - Q = 2fi, we have P±Q = 2P' for some P' G E(Q). Then Ah{P') = h(2P') = h(P ±Q)< h(P) -f h(Q) <h(P) + C<2h(P)<4h(P). So h{P') < h(P). By minimality, P' is in the group generated by S. Since Q has this property and P ±Q = 2P;, P has this property, contradiction. 6. Geometric Formula for Rank We have now proved that the Q rational points of an elliptic curve E over Q form a finitely generated abelian group. Thus £(Q)-Zr0,F, (4.52) where F is a finite abelian group. The group F is uniquely defined as the torsion subgroup, and Chapter V will show how to determine it completely for^any particular E. The integer r is the rank of E(Q). In this section we shall give a geometric limit formula for the rank, under the special assumption that E is of the form y2 = x3 + Ax + By A, Be Z. (4.53) The tool will be the canonical height h defined in §5. We give a complete proof modulo one result from Euclidean Fourier analysis.
6. GEOMETRIC FORMULA FOR RANK 103 Proposition 4.15. With E as in (4.53), there exists a unique Z bilinear form (P, Q) on E(Q) such that (P, P) = h(P). This form descends to £(Q)/F = Zr and is positive definite there. Remarks. Let {P{} be a Z basis of Zr, and let ctJ = (Pi,Pj), so that the form is given by Then the proof will show that the symmetric matrix (c{j) is positive semidefinite and that (£li m»^»» Si m>^«) — 0 only when all m, are 0. Subsequently we shall see that (c,j) is actually a positive definite matrix. Proof. If the bilinear form exists, it has to be given by (P, Q) = \ (h(P + Q)- h(P) - h(Q)). (4.55) This proves uniqueness. For existence, define (P}Q) by (4.55). Then (P, Q) is certainly symmetric. For additivity in the first variable, we use Proposition 4.14 several times. First we have (P,-Q) = l(h(P-Q)-h(P)-h(Q)) = -i(h(P + Q)-h(P)-h(Q)) = -(P,Q). (4.56) Then we can write (P + P',Q) + (P-P',Q) = |(A(P + P' + Q) + h(P -P' + Q) - h(P + P') - h(P - P') - h(Q) - h(Q)) = $(2h{P + Q) + 2h{P') - 2h(P) - 2h(P') - 2h(Q)) = 2(P,g). (4.57) Interchanging P and P' and using (4.56) gives {P+P>,Q)-(P-P>tQ) = 2(P',Q). Adding this to (4.57), we obtain (P + P',Q) = {P,Q) + (P',Q).
104 IV: MORDELL'S THEOREM Thus the form is bilinear. Next let us observe that \(P,Q)\2 < h(P)h(Q). (4.58) In fact, the bilinearity proves that 0 < h(nQ ~ mP) z= n2/i(Q) - 2mn(P, Q) + m2h{P). Hence /i(Q)A2-2(P,Q)A + h(P)>0 for all A £ Q, and the inequality extends to all A G R by a passage to the limit. The discriminant must then be < 0, and the result is (4.58). If Q is a torsion point, then h(Q) = 0, and (4.58) shows that (P>Q) = 0. Hence o = </>, Q) = * {h(P + Q) - A(P) - fc(Q)) = i (AC + Q) - W), and h(P + Q) = h(P). It follows that the form descends to £(Q)/F. On Zr the form is given by (4.54). Since h > 0, we see that for all systems of rationals j^}. ..,^. Passing to the limit, we see that 52sj^*c»i^i ^ 0 f°r a^ rea^s Ai,...,Ar. Thus (c,-j) is positive semidefinite. Let us now sharpen the conclusion of Proposition 4.15 by showing that the matrix (c,-j) is positive definite. We shall use the following celebrated result of Minkowski. Lemma 4.16 (Minkowski). If E is a compact convex set in Rr containing 0 and closed under negatives and having volume > 4r, then E contains a nonzero member of Zr. Remark. Fine tuning of the proof would show that the same conclusion holds when the volume is merely > 2r. Proof. Let N be an integer large enough so that the standard cube C with center 0 and side AN contains E. Suppose E contains no nonzero member of Zr. We claim that the sets {/ + ^E} for / 6 Zr are disjoint.
6. GEOMETRIC FORMULA FOR RANK 105 In fact, if li + ^ei = /2 + \t<i with /i ^ /2, then /i -/2 = ^(e2 — ci), and this is in E since c2 and -e! are in E and i? is convex. If every coordinate of / is < N in absolute value, then / -f ^E is in C. Hence (4AT)r = vol(C) > £ vo,(/ + iE) |coords l|<JV > (2yv)rvoi(^) = 2-r(2;vy voi(E). Since vol(Z?) > 4r by assumption, this is a contradiction. Proposition 4.17. With E as in (4.53), the symmetric positive semidefinite matrix (cl;) in (4.54) is actually positive definite. Proof. Since (ctJ) is symmetric and positive semidefinite, we can choose column vectors t/1),... ,v(r) that form an orthonormal basis of eigenvectors of (c,j) with respective eigenvalues \W> ...> \(r)>Q. We are to prove that A^r) > 0. Arguing by contradiction, suppose A(r) = 0. Let {e^} be the standard orthonormal basis of column vectors, and write vW = ^f v; V*\ We shall identify P = ]T^ mi^i w^h tne column vector ]T\ ™>i^x\ so that the Zr quotient of E(Q) gets identified with the standard lattice of column vectors. The form (•, •) then transfers to column vectors with the definition \ i j I ij Then = I>(t)*2 (4-59)
106 IV: MORDELL'S THEOREM since the v^ are orthonormal. Let e be the minimum value of h(P) = (P, P) for P £ F. This exists and is > 0 since {P\h(P) < C] is always finite. Let E be the compact convex set of column vectors with M chosen so large that \o\(E) > 4r. By Lemma 4.16, E contains a nonzero lattice point ^b^v^ <-+ P. But then (4.59) gives \ k I I *=1 r-1 = £a(*>*£ since A<r> = 0 k-l r-1 <EA(1)(2^(T)) since £>« is in £ Jb=l e "" 2' and we have a contradiction. Thus A^r^ > 0, and (c,j) is definite. The matrix (cij) = ((Pi,Pj)) depends on the choice of the basis {P,} of Zr, but its determinant does not. The elliptic regulator of E over Q is the number «js/Q=det((Pi>Pi)). Proposition 4.17 shows that Re/q > 0. This number enters into the statement of the third form of the Birch and Swinnerton-Dyer Conjecture discussed in Chapter I. We can now give the geometric interpretation of rank. We use the symbol ~ to denote "asymptotic to," in the sense that the ratio tends to 1. Proposition 4.18. With E as in (4.53), the following formula is valid as T —► oo: #{(*,») €£(Q) | k|eo<T} { (=#(F) ifr = 0 ~ M^L(lo&Ty/2 ifr>0. R VB/Q Here #(•) refers to the number of elements in a set, and Or is the volume of the unit ball in Rr.
7. UPPER BOUND ON THE RANK 107 Proof. We may assume r > 0. Let (cij) be the positive definite matrix in (4.54). With e^ as the standard basis of Rr, we borrow the following fact from Euclidean Fourier analysis: The number of lattice points ^n;e(l) with Y2nicijnj <* ls nrr/2 ~ det(Cii)i/2 as t —► oo. Consequently the number of points J2niP* 4- / in £(Q) = Zf0F with ft(£ mPi + /) < t is #(F)nrr/2 as 2 —► oo. Since ft — fto is bounded and r > 0, the same asymptotic estimate is valid for fto- Taking t — logT and using the definition of fto, we obtain the proposition. 7. Upper Bound on the Rank Although Proposition 4.18 does give a limit formula for the rank of £(Q), the formula is not very practical as a computational tool, even for getting an idea of what the rank is. In this section we shall take a different approach to estimating the rank, namely by sharpening the bound on the order of £(Q)/2£(Q). For simplicity we return to the situation of §3, where the elliptic curve E is given by (4.17) and the roots a,/?,7 of the cubic polynomial are integers. We shall obtain an upper bound for the rank that is sharp in a number of cases. The same principle applies in the more general setting of §4 (where a,/?,7 are not necessarily integers), but the upper bound is inclined to be too big. In the situation of §3, let us notice an easy upper bound for the rank r. Proposition 4.8 shows that the order of E(Q)/2£,(Q) is bounded above by 2 to the power s = 2-|-2#{primesp \ p\d), d being the discriminant. On the other hand, the torsion subgroup F of £(Q) contains Z2 0 Z2 as a subgroup and, because of the structure of #(R), contains no more than two cyclic summands of even order. It is thus the direct sum of a group of odd order and two cyclic groups of
108 IV: MORDELL'S THEOREM even order. Consequently F contributes 22 to the order of i?(Q)/2i?(Q). Meanwhile the free abelian part V of £(Q) contributes T to the order of £(Q)/2F(Q). Thus r+2<s=2+ 2#{primes p \ p \ d}} and our easy upper bound is r <2#{primes p \ p \ d). (4.60) Our objective is to sharpen (4.60). Let us call p good if p \ d p fairly bad if p divides exactly one of a — /?, /? — 7, a_=- 7 p very bad if p divides all three of a — /?,/? — 7, a — 7. These notions are related to the discussion in §111.5, but we omit the details. Let n\ = number of fairly bad primes ri2 = number of very bad primes Our first improvement of (4.60) is contained in the following proposition. Proposition 4.19. Let E be the elliptic curve (4.17) over Q with integer roots a,/?,7, and let r be the rank of £(Q). Then r < ni + 2n2- 1. (4.61) Proof. We go over the proof of Proposition 4.8 in order to reduce the image of E(Q)/2£(Q) —> J2 ® (Z2 © Z2) (4.62) ±.pM while keeping the mapping one-one. We shall show that the image in the ± coordinate lies in a subgroup Z2 of Z2 © Z2 and the same is true of the pth coordinate if p is fairly bad. Then the same analysis that gave (4.60) will now give (4.61). First consider the ± coordinate. The roots a,/?,7 are integers and can be ordered. Let us say a < (J < 7, so that x-a>x-/?>x-7. (4.63)
7. UPPER BOUND ON THE RANK 109 If x £ {a,/?,7}, the possible signs in (4.63) are apparently + + +, 4- -\ , -| , and — . For a point (x,y) in £7(Q), the product (x — or)(x — 0)(x — 7) equals a square (namely y2), and thus only -f + -f and H can occur. Therefore the ± coordinate of the image of <pa in (4.20) and (4.25) is 0, and the first factor of Z2 can be dropped. [Note that if (4.25) used <pa x y?7 or <pp x ^?7, then the image in the ± coordinate would still be in a Z2 subgroup of Z2 0 Z2, namely 0 0 Z2 and diag(Z2) in the respective cases.] Actually the above argument applies only when x £ {a,/?,7}. When x = a, (4.19) shows that the i component of <pQ(x) is obtained from sgn[(a — (3)(a — 7)] = -f 1 and hence is trivial. When x is /? or 7, then the i component of <p0(x) is obtained from sgn(x — cv) = +1 and hence is trivial. Now consider the p coordinate when p is fairly bad. First let us assume that p I (ot—/3). Let (x, y) be a point other than 00 on i?(Q), and suppose temporarily that x ^ {a,/?, 7}. Define integers a,6,c by pa||(*-a), p* |l (*"/?)■ Pc||(*-t). In (4.26) we saw that a -f 6 + c = 0 mod 2. Also we saw that if any of a, 6, c is < 0, then a = 6 = c = 0 mod 2, so that the image of <£>a x y?^ in the pth coordinate is (a mod 2, 6 mod 2) =■ (0,0). Suppose a > 0, so that p | (x — a) in the sense that the numerator of x — a hasp as a factor. Since p f (or —7), we have p \ (x —7). Thus c = 0 and a -f 6 = 0 mod 2, and the image of (pa x ^ in the pth coordinate is contained in diag(Z2). Suppose instead that b > 0, so that p | (x — /?). Since p f (/? — 7), we have p f (x — 7). Thus c = 0, a + 6 = 0 mod 2, and the image is in diag(Z2). Finally suppose instead that c > 0, so that p I (x — 7). Since p f (7 — a) and p f (7 — /?), we have p f (x — a) and p | (x — /?). Thus a = b — 0, and the image 0 is in diag(Z2). Exceptional cases occur when x £ {or,/?,7}. If x = a, then <pa(x) — (a — (3){ot — 7) and y>/?(x) — 0 — ex. Since <Pa{x)/<Pp{x) is not divisible by p, the image of <pa(x) x <pp(x) in the pth coordinate is again contained in diag(Z2). If x = /? instead, then (fa(x) = /? - a and ^(x) = (/? — a)(/J — 7). Since <pp(x)/(pa(x) is not divisible by p, the image of (poi(x) x ^0(2) in the pth coordinate is contained in diag(Z2). If x = 7 instead, then (pa{x) = (7 — a) and ^(z) = (7 — /?) are not divisible by p. So a = 6 = 0, and the image 0 is in diag(Z). We have been assuming that p | (a — /?). If p | (0 — 7) instead, the pth coordinate of the image of <pa x <pp is in 0 0 Z2, while if p | (a — 7), the pth coordinate of the image of (pa x (pp is in Z2 0 0. In each case the image is confined to a Z2 subgroup of Z2 0 Z2. This completes the proof.
110 IV: MORDELL'S THEOREM Once again let us return to congruent numbers. The integer n is congruent if n is the area of a rational right triangle. Let us borrow the following lemma from Theorem 5.2. Lemma 4.20. If n is a square-free integer and E is the elliptic curve y2 = x3 - n2x, then the torsion subgroup of E(Q) is Z2 8 Z2. Proposition 4.21. A square-free integer n fails to be congruent if and only if the elliptic curve E : y2 = x3 - n2x, has the property that E(Q) has rank 0. Proof. If #(Q) has rank 0, it is a torsion group, and Lemma 4.20 shows it to be Z2 ® Z2. By (1) => (3) in Proposition 3.1, n is not congruent. Conversely if n is not congruent, Corollary 4.3 shows that E has no Q rational point other than (—n,0), (0,0), (n,0) and 00. Hence £(Q) has rank 0. Corollary 4.22 (Fermat). n = 1 is not a congruent number. Proof. The elliptic curve E with y2 = x3 — x is of the form considered in Proposition 4.19, with {a,/?,7} = {—1,0,1}. All primes are good except p = 2, which is fairly bad. Thus n\ = 1 and ri2 = 0. By Proposition 4.19, #(Q) has rank 0. By Proposition 4.21, n = 1 is not congruent. We already saw in Corollary 4.4 that n = 2 is not a congruent number. If we attempt to reprove this result from Propositions 4.19 and 4.21, we run into a problem. The relevant elliptic curve is y2 = x3 — 4x, with {<*,/?, 7} — {—2,0,2}. All primes are good except p = 2, which is very bad. Thus n\ = 0 and n2 = 1, and Proposition 4.19 gives us the estimate r < 1 for the rank. We do not immediately get r = 0. However, an even more careful analysis of the proof of Proposition 4.8 will give the sharper estimate. We need to study the relationship between the ± coordinate of (pa x <pp and the p = 2 coordinate. In Table 4.1 below, we list the effect of (p = <pa,<pp,<py on an x(P) other than or,/?,7. Each column for ± gives a conceivable configuration of signs, and each column for p = 2 gives a conceivable set of residues modulo 2 for the power of 2 in <p(x). We keep in mind that (x + 2)x(x — 2) has to be a square.
7. UPPER BOUND ON THE RANK 111 Map Image mod Qx2 ± 2 \ <Pa x + 2 ++ 0011 (pp x + — 0 10 1 \ <py x-2 + - 0110 Table 4.1. Image of <p for y2 = x3 - Ax with x £ {-2,0,2} At all other primes the image of each <p is a square. Conceivably all 8 combinations of a column for ± and a column for p = 2 could occur; actually we shall see that there is a restriction. The idea is to adjust P by an element of order 2 to make the p = 2 column trivial. For P = (<*,0), (/?,0), (t,0) the corresponding information is assembled in Table 4.2. Map <P* <Pp Vi Image of (-2,0) 8 -2 -4 ± + - - 2 1 1 0 Image of (0,0) 2 -4 -2 ± + - - 2 1 0 1 Image of (2,0) 4 2 8 ± + + + 2 0 1 1 Table 4.2. Image of tp for y2 = x3 - Ax with x e {-2,0,2} Since all nontrivial columns appear for p = 2 and since each <p is a homomorphism, we can adjust any point P under study by adding an element of Z2 0 Z2 so that (pQ) <pp, and y?7 all map the adjusted point into (0,0) in the p = 2 coordinate. For the adjusted point, which we still call P = (x(P),y(P)), the image of cp is as in Table 4.3. Map fa f& <Py Image mod Qx2 x + 2 X x-2 ± + + + - + - 2 0 0 0 Total □ D □ -□ D -□! Table 4.3. Adjusted image of <p for y2 = x3 — Ax
112 IV: MORDELL'S THEOREM There are only two possibilities, one of two columns from i and the column for p = 2. We shall show that the second possibility, namely * + 2 = [], * = -□. x-2 = -Q, (4.64) does not occur. Then it will follow that r = 0. Arguing by contradiction, suppose (4.64) occurs. Subtracting the first two equations gives If these squares have even denominators, then the power of two is the same for each and is even; also the numerators must be odd. Clearing denominators, we obtain 0 = Q + □ mod 8 with both squares odd integers, and this is a contradiction. Thus the first two squares in (4.64) have odd denominators. Subtracting the last two equations in (4.64) gives 2 = -D + D- and the first of these squares has an odd denominator. Therefore so does the second. Clearing denominators, we obtain 2m2 = -[] + Q mod 8 (4.65) with m an odd integer and both squares equal to integers. Then m2 = 1 mod 8, and (4.65) is impossible. Thus (4.64) cannot occur, and y1 = x3 — 4x has rank 0. This argument has given a different proof that n — 2 is not congruent. A similar technique gives refined information about the rank of £(Q) for t/2 = x3 — p2x, where p is an odd prime. The result is as follows. Proposition 4.23. Let p be an odd prime, and let E be the elliptic curve i "\ i y = x — p x. Then the rank r of £(Q) satisfies r < 2 if p = 1 mod 8 r = 0 if p = 3 mod 8 r <\ ifp=5or7 mod 8. Consequently any p = 3 mod 8 is not a congruent number.
7. UPPER BOUND ON THE RANK 113 REMARK. The proof will use some facts about quadratic residues from elementary number theory. Proof. We argue as above, taking {a,/?, 7} = {-p, 0,p}. All primes are good except 2 and p; 2 is fairly bad, and p is very bad. The information analogous to what is in Table 4.1 is assembled as Table 4.4. Image mod Qx2 ± 2 p x+p + + 0 1 0 0 11 x + - 0 0 0 1 0 1 x-p + - 0 1 0 110 Table 4.4. Image of <p for y2 = x3 — p2x with x £ { — p,0,p} If we make a table like Table 4.2 that lists what happens for the torsion points (—p, 0), (0,0), and (p,0), we find that the torsion points account for all nontrivial columns for the prime p in Table 4.4. Since each (p is a homomorphism, we can adjust any point P under study so that <pa) (fpy and (py all map the adjusted point into (0,0) in the p coordinate. For the adjusted point, which we still call P = (x(P),y(P)), Table 4.5 shows the four possibilities for the combined contribution from a column for ± and a column for 2. The point is that certain columns listed under "Total," depending on p, cannot occur. If only two columns can occur, then r < 1. If only one column can occur, then r = 0. Map ¥>a w ^7 Image mod Qx2 x + p x x — p □ □ □ Total □ "□ -□ 2D □ 2D 2D -□ -2Qj Table 4.5. Adjusted image of <p for y2 — x3 — p2x Map <pa
114 IV: MORDELL'S THEOREM □ Suppose I — Q I occurs. Subtracting the expressions for x + p and x — p, we obtain 2p = Q -f Q. If p occurs in the numerator of one square, then p2 occurs in the numerators of both squares and p2|2p gives a contradiction. Thus the numerators of the squares are prime to p. Clearing fractions and reducing modulo p, we obtain 0 = f~~] + [~~] mod p, with both squares prime to p. Then — 1 is a square modulo p, and it follows that p = 1 mod 4. Thus occurring => p = 1 or 5 mod 8. (4.66) (2°\ Suppose Q I occurs. The first two entries give p = 2Q - Q. Arguing as in the previous paragraph, we are led to 0 = 2Q] — £J mod p, with both squares prime to p. Then 2 is a square modulo p, and it follows that p = ±1 mod 8. Thus p = 1 or 7 mod 8. (4.67) 2D Suppose I — Q I occurs. The second and third entries give p = 2Q — Q, and the previous paragraph again shows that p = 1 or 7 mod 8. The first and second entries give p = 2Q] + []]. Arguing as above, we conclude that —2 is a square modulo p, and it follows that p = 1 or 3 mod 8. Thus occurring => p=l mod 8. (4.68) In view of how the number of occurring columns affects the rank, the implications (4.66), (4.67), and (4.68) prove the proposition.
8. CONSTRUCTION OF POINTS IN E(Q) 115 8. Construction of Points in #(Q) The refinements to the method of descent in §7 are not good enough to produce members of #(Q) with certainty, but they are excellent at cutting down the possibilities enough to make a computer search manageable. We shall consider two examples in this section. The first example will go in the converse direction to Proposition 4.23 by showing that all small primes p = 5 mod 8 are congruent numbers (i.e., r = 1 in Proposition 4.23). The second example will solve Fermat's problem for Mersenne that was introduced in Subsection E of §111.1. A, Congruent numbers For the first example, fix a prime p = 5 mod 8, According to (4.66), if p is congruent, then we can write f x + j/ x for suitable squares in Q and for some x — x(P). To quantify matters write /x + jA / a2 \ Then we must have a2 — c2 = —262, which we write as a2 = c2-2b2. (4.70) We parametrize the solutions by the method of Diophantus in Chapter I: We change the equation from projective to affine form by writing it as l = u2-2v2. (4.71) Then (u,v) = (1,0) gives one solution. A line through (1,0) is u = tv+l for any choice of t in Q. Substituting, we obtain 1 =(<v+ l)2-2t;2, It which is solved by v = 0 and v = -. Thus our parametrization of solutions of (4.71) is , x /2-M2 2t \
116 IV: MORDELL'S THEOREM Write i = r/s in lowest terms. Then -(a,b,c) = (l.w.u) = * J2s2 - r\2rs,2s2 + r2), a zs' — r* and a = (2s2 - r2)A 6 = (2rs)A (4.72) c = (2s2 + r2)A for some A € Q. 1 777 We shall show that A is of the form —. In fact, say A = — in lowest n n terms. Then (4.72) shows that m divides the integers na and nb. Hence 77i divides a and 6. Referring to (4.69), we see that m2 divides a2+b2 = p. Thus m — ±\ and we may take m — \. Since A = —, the equation n p = a2 + 62 = (r4 + 4s4)A2 shows that r4+454=pn2. (4.73) At this stage we have thinned out the possibilities considerably. We look for a solution (r, s, n) of (4.73), and it determines (a, 6, c) in (4.72), since A = —. Then (a,byc) determines an x of the correct form so that n (4.69) holds, and x is of the form x(P) for a point P in £(Q). By letting r and 5 range from 1 to 50 and seeing whether the square root of p~l(rA + As4) is very close to an integer, we are led to all the entries in Table 4.6 except the one for p — 53. Since r and 5 (and ultimately x(P)) exist for each p in the table, all these numbers p are congruent.
8. CONSTRUCTION OF POINTS IN E(Q) 117 p 5 13 29 37 53 61 109 149 r 1 1 7 1 286 41 6 14 s 1 3 5 21 119 39 7 17 n 1 5 13 145 11890 445 10 50 x(P) -4 36 25 4900 ~~169~ 1764 21025 1158313156 35343025 10227204 198025 1764 25 56644 625 | Table 4.6. Some primes p = 5 mod 8 that are congruent numbers To discover (r, s,n) for p — 53 in the table, we refine the argument still further. We are trying to solve (r2)2 + 4(s2)2 = 53n2. Consider the affine curve j2+4*2 = 53 over Q. One solution is (j>k) = (7,1), and the method of Diophantus in Chapter I will allow us to parametrize all solutions. A line through (7,1) is (ib — 1) = t(j — 7) for any choice of /. Substituting for k> we obtain a formula for j in terms of t. Then we can recover Jfc. The result is /28t2-8f-7 -4t2-14t + l\ W'*)-^ 4*2 + ! » 4,2 +1 J'
118 IV: MORDELL'S THEOREM which we can reparametrize as Putting t = d/e in lowest terms gives (r2,s2, n) = fi(7d2 - 4de - 7e2, -d2 - Ide + e2, d2 + e2). (4.74) From the third entry, we see that /i > 0. The numerator of ft can be taken to be 1 since GCD(r, s) = 1. If the denominator is hy then h divides 7d2 - Ade - 7e2 + 7(-d2 - Ide + e2) = -53de and d2 + e2. Any prime factor of h must divide 53de. If it divides d or e, then it divides d2 -f e2 and hence both d and e, contradiction. Thus \i = ^ or /, = i. _ Handling // = ^ does not look very easy. So let us look for a solution with \jl — 1. Our equality is then (r2, s2, n) = (7d2 - 4de - 7e2, -d2 - 7de + e2, d2 + e2). (4.75) We analyze -d2_7de + e2 = r-| (=s2) by the method of Diophantus. The affine equation is -d'2-7d'e' + e'2 = l, and (d', e') = (0,1) is a solution. The line e' = d't + 1 leads to l«>e>>- ^_t2 + 7< + 1' _t2 + 7t + 1y- If t = £/t] in lowest terms, then (d,e,a) = u(2(r, - ln\ ? + n\ -? + 7£ij + n2). (4.76) In the same way as for \i in (4.74), we find v — -^ or v — 1. We look for a solution with v = 1. Using a short computer search with small values of £ and 77 and with d and e determined by (4.76), we check whether the first entry 7d2 — 4de — 7e2 of (4.75) is a square, so that we can take it as r2. For f = 10 and rj = 3, we do get a square, and the result is what is listed in Table 4.6. The reader may wish to try such an analysis for p = 101, which is the only p = 5 mod 8 that is < 150 and is missing from Table 4.6.
8. CONSTRUCTION OF POINTS IN E(Q) 119 B. Fermat's problem to Mersenne We seek an integer-valued relatively prime Pythagorean triple (X, y, Z) such that the hypotenuse Z is a square and the sum of the legs is a square. In other words we want X2 + Y2 = Z2y Z = b2y X + Y = a2. With e = X - y, we obtain X and Y as f (a2 ± e)> and (3.17) showed the problem comes down to solving 264-a4 = e2. (4.77) A solution in the style of Fermat proceeds as follows. We can work a little sloppily, apparently discarding possibilities, because all we need is one solution. From X2 + Y2 = Z2y we write X = m2-n2y Y = 2mny Z = m2 + n2 (4.78) by Theorem 1.2. (Actually this form of a solution assumes that X is odd; but if it works, it works.) From b2 = m2 -f n2 we have m = r2 - s2y n = 2rsy b = r2 + s2. (4.79) Meanwhile, the Diophantus method of Chapter I leads from a2 = X + Y = (m + n)2- 2n2 to m-f n = t2 + 2u2, n = 2tu, a~t2-2u2, hence to m = *2-2*u + 2u2, n-2tu, a = t2-2u2. (4.80) Equating the two expressions for n in (4.79) and (4.80), we obtain rs = tu. Thus - = —, and we can write this as — in lowest terms. Then t s c r = kdy t = kcy u = ldy s = Ic. (4.81) Equating the two expressions for m in (4.79) and (4.80), we obtain r2 - s2 = m = t2 - 2tu + 2u2.
120 IV: MORDELL'S THEOREM Substitution from (4.81) gives Jfc2d2 - /V = *V - 2klcd + 2/2d2 and hence 0 = (c2 + 2d2) (0 2 - 2cd ({) + (c2 - <f2). (4.82) For l/k to be in Q, the discriminant must be a square in Q. Thus c2d2_(c2 + 2d2)(c2-d2) = n> which we rewrite as 2d4 - c4 = /2. (4.83) This equation is of the same form as (4.77), but the integers are usually much smaller here. At first it looks as if the method of descent is leading us to a proof that there is no solution. But there is a bottom nontrivial solution with (c>d) = (1,1). Substituting into (4.82), we obtain — = A: 2 / 0 or -. The solution — = 0 leads back to (a,6) = (1,1), which is not / 2 interesting. From — = -, we have / = 2, k — 3. Going backwards, we k 3 calculate r = 3 t = 3 u = 2 s = 2 m = 5 n= 12 0= 1 6=13 X = -119 Y = 120 Z = 169 e = -239 We must discard the result of this trial since X is negative. But we can continue. The (a, 6) from this trial becomes the (c, d) of our next trial. Putting (c,dy f) = (1,13,-239), we have / _ cd±f _ 13 T 239 2 84 h "" c2 + 2d2 " 1 + 2 169 " 3 an 113' / 2 From — = — -, we have / = —2, k — 3. Going backwards, we calculate r = 39 / = 3 u= -26 s = -2 m= 1517 n = -156 a = -1343 b = 1525 X = 2276953 Y = -473304 Z = 2325625 e = 2750257.
8. CONSTRUCTION OF POINTS IN £(Q) 121 This time we have a minus sign in Y and discard the result. We try instead ^, using / = 84, Jfc = 113. Then r = 1469 m = 2150905 X = 4565486027761 t = 113 n = 246792 Y = 1061652293520 u = 1092 a = -2372159 Z = 4687298610289 5 = 84 6 = 2165017 e = 3503833734241. This is the solution given in (3.16). To analyze the solution technique, we work backwards, obtaining a formula for (a, 6) in terms of (c,d). Actually we had a choice of a sign for £, and we may regard (a,6,e) as depending on (c,d,/). The correspondence will be nicer if we choose the ambiguous sign in j so that I _ cd-f k ~~ c2 + 2d? ' With this choice made, (a,6,e) becomes a function of (c, d> /). Meanwhile the tuples (a,6,e) and (c, d, /), which give Z solutions of (3.17), then give Q solutions of (3.18) and, by virtue of (3.19) and its inverse (3.21), Q rational points on the elliptic curve y2 = x3 + 8x. Table 4.7 shows what the effect of (c, d,/) —► (a,6,e) is, in terms of the group law for the elliptic curve. The table uses the abbreviations P0 = (8,-24) and T = (0,0). The point P0 generates an infinite cyclic subgroup of £(Q), and T has order two. d 1 1 -1 -1 13 13 -13 -13 / 1 -1 1 -1 239 -239 239 -239 Point O Po Po + T T -Po 2Po 2P0 + T -Po+T a 1 1 1 1 -1343 -2372159 -2372159 -1343 6 1 13 13 1 1525 2165017 2165017 1525 c 1 -239 -239 1 2750257 3503833734241 3503833734241 2750257 Point O 2P0 2Po O -2P0 4Po 4P0 -2P0 Table 4.7. Effect on E : y2 = x3 + 8x of Fermat's descent for 2b4 - a4 = e2
122 IV: MORDELL'S THEOREM The table shows that Fermat's descent genuinely corresponds to the passage P —► \P for the points in question. The reader may wish to carry out the symbolic manipulations appropriate for general P. 9. Appendix on Algebraic Number Theory The object of this section is to prove Theorem 4.10, which provides a tool that was used in the proof of Mordell's Theorem. Theorem 4.10 requires a certain amount of algebraic number theory as preparation, including three basic theorems. We shall state the necessary preliminary results with most of the proofs omitted. References are given in the chapter of "Notes." We regard Q as a subfield of C, the subfield of elements that are roots of polynomials P(X) with Z coefficients. The members of Q are called algebraic numbers. A number field A' is a subfield of Q that is finite-dimensional over Q. An algebraic integer is an algebraic number that is the root of a monic polynomial P(X) over Z. The set of such is denoted O. If 9 £ Q is a root of anXn + ... + aiX + a0, then an6 is a root of Xn + an.xXn~x + an-2anXn-2 + • • • + a.a^X + a0a^\ Hence 0 tty => k0 G O for some k € Z. (4.84) One step in the proof of unique factorization for Z[X] shows that if a monic polynomial in Z[X] has a monic factor in Q[X], then that factor is actually in Z[X]; consequently the minimal polynomial over Q of any 9 £0 has integer coefficients. The set O of algebraic integers is actually a ring. The standard proof of this fact uses the theory of symmetric polynomials. A simpler proof uses the following lemma, which is handy also for proving (4.88) below. Lemma 4.24. Let V be the additive group generated by complex numbers x\,... ,x/ that.are not all 0. If a is a complex number such that ax £ V whenever x € Vy then a is an algebraic integer. Let us write "left-by-a" for the operation of left multiplication by a. For an example of a group V as in the lemma, let a £ O satisfy Qn + an_iftn_1 -h • • • -h a0 = 0. Then left-by-a maps the Z span of
9. APPENDIX ON ALGEBRAIC NUMBER THEORY 123 an_1,... ,a, 1 into itself. When there are two such elements in O and we use the products of their powers to generate V', the result is the following proposition. Proposition 4.25. O is a ring. Fix a number field A'. It follows from the proposition that Ok = Of\K is a ring, the ring of algebraic integers in K. From (4.84) it follows that K is the field of fractions of Ok • The fact that the only rational roots of a monic polynomial in 2[X] are in Z translates into the equality OHQ = Z. (4.85) Let n = [K : Q]. If {ai,... ,an} is a basis of K over Q and if a is in K, we can write aa, = ]P. aijQj w^h aij € Q- We define Ntf/Q(a) = det(a,;) (norm of a) TV^/q(«) = Tr(a,j) (trace of a) These are functions from K to Q. It is easy to check that (1) Nx/q and Ttk/q are independent of basis. (2) N/<-/q is a multiplicative homomorphism from A'x to Q*, and TY/r/Q is an additive homomorphism from K to Q. (3) Trjr/Q(l) = n # 0. There are exactly n distinct isomorphisms of K into Q fixing Q, and they are given as follows. The Theorem of the Primitive Element says that K = Q(ct) for some a E K. Let P(X) = Xn + cn-iXn-1 + .-. + c0 be the minimal polynomial of a over Q, and let a\ = a, a2,... an be the (necessarily distinct) roots of P in Q. Then Vj{Cn-\(Xn~l + hCia + Co) = Cn_iO^~l + hCiQfj + C0. If/? is in Ky the elements 0j(/?) (with <T\(fi) = ft) are called the conjugates of/? in Q. They need not lie in K. However, each <tj{P) has the same minimal polynomial as /? over Q. Consequently <Tj carries Ok into 0. (4.86) To bring the conjugates of fi into the theory, one uses the following proposition.
124 IV: MORDELL'S THEOREM Proposition 4.26. For /? G K let g be the minimal polynomial of /? over Q. Then (deg g) \ n. Also left-by-/? as a Q linear map from K to K satisfies n det(*/ - left-by-/?) = g{x)r = [J (x - *j(/?)), i=i where r = n/(deg #). Using Proposition 4.26, one can show that n ^*/q(/*) = £*;(«• i=i By (4.85) and (4.86), N*/q and Tt/c/q carry 0* into Z. A unit in Ok is an invertible element. The set of units is denoted Oft. The first of the three basic theorems in beginning algebraic number theory is the Dirichlet Unit Theorem, which gives the structure of 0%. We shall give a weak form of the theorem as Theorem 4.29. Lemma 4.27. The units in Ok are the members e of Ok with Njc/Q(e) = ±1. Proof. If e is a unit, then ^k/q^"1) = N^/q^)""1 € Z shows that N/c/q(£) is an invertible integer, hence is ±1. Conversely we use (4.86), remembering that &\ = 1. Then shows that If Nk/q(£) = ±1, then the right side is in O. Lemma 4.28. The torsion subgroup of 0% consists of all 7Vth roots of unity e2irxllN for some N that is bounded in terms of n = [K : Q].
9. APPENDIX ON ALGEBRAIC NUMBER THEORY 125 One proof of Lemma 4.28 uses properties of cyclotomic polynomials and the formula for the Euler ^function. The isomorphisms <x; : K —► Q C C, 1 < j < n are of two types: (a) those carrying K into R. Say there are n of these. (b) those carrying K into C but not R. These come in pairs a and <r, where the bar refers to complex conjugation. Say there are 2r2 of these. Then r\ + 2r2 = n. Let <t\ ,..., <rri be of the first kind and GVi + l^rj + l, • • • ,0-ri+r3,0-ri+r2 be of the second kind. We use each <tj to form an absolute value on K: \\x\\j = Wj(x)\ for i < 5 < n + r2- Then the mapping Log : x —► (log \\z\\u..., log ||*||ri+ra) is a group homomorphism of K* into Rri+r2. Theorem 4.29 (Dirichlet Unit Theorem). On 0£y the kernel of Log is the (finite) torsion subgroup, and the image is a discrete subgroup of the subspace of Rri+r2 where *i + • • • + xri + 2xri+i + • • • + 2xri+r2 = 0. Consequently 0% is a finitely generated group of rank < r\ -f r2 — 1. The finiteness of the torsion subgroup is by Lemma 4.28. The version of the theorem stated here is not very hard to prove. The full version of the Dirichlet Unit Theorem says that the rank of 0\ equals v\ + r2 - 1, and its proof takes considerably more effort. But we do not need this stronger version. We turn our attention now to the structure of ideals. If a\>..., an are in K, their discriminant is the element of Q given by A/r/Q(ari,..., an) = det (TV K/ q («i a j))- If a\,..., an are in Ok then it is clear that their discriminant is in Z. It is not too hard to establish the following additional properties of A^/q' (1) Ajr/Q(<*i, • • • ,<*n) ^ 0 if and only if {c*i,... tan} is a basis of K over Q. (2) If {ai,..., orn} and {/?i,... ,/?n} are bases of K over Q and if ot{ — 5^. ciij/3j with all aij G Z, then AK/Q(au. ..,«„) = [det(a0-)]2A|f/Q(/?i>... ,/?„). (3) A^/Q(a1)...,an)3:[det((Ti(al.))]2.
126 IV: MORDELL'S THEOREM Proposition 4.30. (a) Every nonzero ideal / in Ok (including Ok itself) is additively a free abelian group of rank n whose Q span is K. (b) If / is a nonzero ideal in Ok> then Ak/q(<Xi > • • • > <*„) is the same for every Z basis e*i,..., an of /. (c) Every nonzero ideal in Ok contains a nonzero member of Z. (d) Every nonzero ideal in Ok has \Ok/I\ < oo. (e) Ok is a Noetherian ring, i.e., every ascending sequence of ideals terminates. (f) Every nonzero proper prime ideal in Ok is maximal. We define a product operation on nonzero ideals / and J in Ok by IJ = {all sums of products from / and J}. This product is associative and commutative, and Ok is an identity. If / and J are nonzero ideals in Okj we say that I and J are equivalent if (a)I — {0)J for some nonzero principal ideals (a) and (/?) in Ok- ^ is easy to check that this notion of "equivalent" has the following properties: (1) it is an equivalence relation (2) the product operation is a class property (3) the principal ideals form a single equivalence class, which acts as an identity. The number Uk of classes of nonzero ideals in Ok is called the class number of K. The class number equals one if and only if Ok is a principal ideal domain. The following is the second of the three basic theorems in beginning algebraic number theory. Theorem 4.31. The class number of a number field is finite. The proof requires a lemma in diophantine approximation and an elementary argument using the notions above. For the case that [K : Q] = 2, the classes of ideals correspond to certain classes of binary quadratic forms, and the finiteness of Hk (as well as its value) is more apparent. Corollary 4.32. Multiplication of classes of nonzero ideals in Ok is a group operation in which the identity is the class of principal ideals.
9. APPENDIX ON ALGEBRAIC NUMBER THEORY 127 The group in the statement of the corollary is called the ideal class group; its order is the class number of K. For the proof of the corollary, one first shows for nonzero ideals I and J in Ok and for a ^ 0 in Ok that (a)I = JI => (a) = J. (4.88) Lemma 4.24 is a tool in the argument. Then the finiteness of the class number forces two powers /' and P of an ideal I to be equivalent, and finally (4.88) allows one to show that /li_,l-1 is an inverse of the class of/. Next we give the third of the three basic theorems in beginning algebraic number theory. Theorem 4.33 (Unique factorization of ideals). Every proper ideal / in Ok is uniquely a product I = FL^i P?* > where the Pi are proper nonzero prime ideals and the k{ are integers > 0. The integers &, are characterized by the property: Pf* D I but P*'+1 ^ /. The pr6of of Theorem 4.33 begins with two preliminary facts. The first says for three nonzero ideals A,B,C of Ok that AB = AC => B = C. (4.89) The other preliminary fact says about ideals: To contain is to divide. Specifically if A and B are nonzero ideals of Ok, then BD A => there exists an ideal C with A = BC. (4.90) The existence of factorization of ideals uses (4.90) and parts (d) and (e) of Proposition 4.30. Uniqueness uses also (4.89), Proposition 4.30f, the finite class group (Theorem 4.31 and Corollary 4.32), and the inequality \Nk/q(<x)\ > 2 for a € Ok that is not a unit (Lemma 4.27). Proof of Theorem 4.10. Write h = hK- Let Iu...tIh be representatives of the finitely many classes of ideals (Theorem 4.31), and let us say 11 = (1). Let u; be a nonzero element of 7j, and put u = u{ • • • u^. Then u is in Ij for 1 < j < h. Define S = {1, uy u2,... }. Notice that (1) 1 € 5 and 0 £ S (2) 5 is closed under multiplication. From these properties it follows that S~1Ok = {s~1a \ s G 5, a G Ok} is a ring. We shall prove that S~1Ok is a principal ideal domain and that its group of units is finitely generated as an abelian group.
128 IV: MORDELL'S THEOREM If/ is an ideal in Ok, then J = S~~1Ok is clearly an ideal in S~1Ok- Every ideal Is of S~1Ok arises in this way. In fact, / = Is CiOjc is an ideal in Ok and / = S~l(Is HO*-) coincides with Is [To verify this equality, let i 6 /. Then i = s~la with a £ Is C\Ok- Since 5"1 C S~1Ok and since Is is an ideal in S~1Ok> « = s~xa is in Is- Conversely if i is in Is, write t = s~~la with a € Ok- Then a = si is also in Is, hence is in Is O (9/f, and i = s~la exhibits i as in s-1(isnoK) = i.) If /s is an ideal in 5 1Ok, we are to show that Is is principal. Put / = Is C\Ok- Then / is equivalent with some Ij, I < j < h\ let us write («)/ = (/%-. Since u is in Ij flS, we have 5"1/,- = S"1Ok, and thus (a)sJs = 5-1(a)5"1/ = S"1 (/?)/> = S-1Ok(P) = (£)s. (4-91) where (a)5 and (/?)s denote principal ideals in S"1Ok- FVom (4.91) let us see that /?/a is in S'xOK and 7S = (^0)5. (4.92) In fact, (4.91) allows us to write P = ai0 for some i0 € Is- Hence PI a = t0 £ Is Q S-xOK, and (/?/or)s C /5. In the reverse direction, let i 6 Is be given, and use (4.91) to write ai = /3x with x 6 S~1Ok> Then i = — x shows i is in (p/a)s, and we obtain ^5 C (P/a)s. This Of proves (4.92). Since Is was an arbitrary ideal in S~1Ok, (4.92) shows that S~1Ok is a principal ideal domain. This completes the first half of the proof. The second half of the proof will show that (S"1Ok)x is finitely generated. The argument is a little subtle, as more generators than it and Ox are needed. Let u-'a be in (S~lOK)x} and write (u^a)"1 = tT*/?. Then ap = tt'+*. So or is a divisor of a power > 0 of it. We seek a finite set of generators for the divisors of all powers > 0 of u. Thus let otP = ur with a,p eOK- Let (u) = ^.--P*- (4.93) be the factorization of (u) given by Theorem 4.33. Then we have (a)(/?) = (ur) = (u)r = ^r--.^r.
9. APPENDIX ON ALGEBRAIC NUMBER THEORY 129 By the uniquenesss in Theorem 4.33, (a) = Pj1 • ■ • Pjf with 0 < lj < kjr for 1 < j < N. By Corollary 4.32 (and Theorem 4.31), Pj1 is principal for each j. Write P? = (7;). For each j, write lj = qjh + r; with 0 < rj < h, so that (a) = (Tl)?,-(7N)?"Pr-P^. (4.94) Then a = Tl«l • • • 7^i for some i G Pf •• ■ Pr„N. Ot Dividing, we see that —^ ^ (= i) is in Ok- Hence we can rewrite Ti " '7n (4.94) as -v*1 ... ^N ( Q ] — (*JX . . . ***** ^ Pr» ... Pr" Ti "-In I 7*i . ..7** J - wi 7n ;m Ov • By (4.89) we conclude that Pri ... pr" — ( a \ In other words, for (4.94) to hold, the ideal P[l • • • P#N has to be principal, say />[..../>;/< = (sri rj, and then (4.94) is the statement that a = 7?1 ■ ■ -T^r, rwe with e in 0* . (4.95) From (4.95), we see that (S~lOx)x is generated by 71,..., 7^, the finite number of elements £r,,...,rft (since the ry's are < /i), and (9£. Theorem 4.29 shows that 0% is finitely generated. Hence (S~1Ok)x is finitely generated. Example. Let K = Q(>/z5). Then Ok = Zfv^]. The theory of binary quadratic forms shows that hx — 2. In the above proof we can take h — (1) and I2 — (2, 1 -f >/--5). For u we can use u\ = 1, W2 = 2, it = 2. The factorization (4.93) is (2) = P* with Px = (2,1 + y/-E). Thus JV = 1, and we can take 71 = 2. The only ideals P[l • • • PrNN that need to be considered are P* = (1) and P/, which is not principal. Thus the only 6rit...rN is 1. The units of S~1Ok are therefore generated by 7i=2,<$o = l',and(0tf)*={±l}.
CHAPTER V TORSION SUBGROUP OF £(Q) 1. Overview Let E be an elliptic curve over Q with notation as in the Weierstrass form (3.23). As usual, we can make an admissible change of variables so that all the coeflficients are in Z. Let A 6 Z be the discriminant of E. By Mordell's Theorem, £(Q) is a finitely generated abelian group. Our objective in this chapter is to study the torsion subgroup #(Q)tore- For any particular curve, we shall see that we can determine i?(Q)tors explicitly. The main tool in the analysis will be reduction modulo a prime p. For the curve E itself, with its Z coefficients, reduction modulo p will amount to considering the curve Ep over Zp with the Z coefficients taken modulo p. It is less apparent how to get reduction modulo p to be a well defined mapping rp defined from all of i?(Q), including points with factors of p in the denominators of the coordinates, into Ep(Zp). Once we have done this in §2, we will find that rp : £(Q) —► Ep(lp) is a group homomorphism if p \ A (i.e., if Ep is nonsingular). The main theorem is the following result discovered in the 1930's independently by Lutz and Nagell. Theorem 5.1 (Lutz-Nagell). Let E be an elliptic curve (3.23) with coefficients in Z. (a) If a2 = 0 and if P = (x(P), y(P), 1) is in £(Q)ton>, then x{P) and y{P) are integers. (b) For any a\, if P — (x(P)yy(P), 1) is in i£(Q)tors> then 4x(P) and Sy(P) are integers. (c) If p is an odd prime such that p | A, then the restriction to £(Q)tors of the reduction homomorphism rp : 2?(Q) —► Ep(lp) is one- one. The same conclusion is valid for p = 2 if 2 f A and a\ = 0. (d) If a\ = a3 = a-2 — 0, so that E is given by y2 = x3 + Ax + B, (5.1) and if P = (x(P),j/(P), 1) is in £(Q)tors, then either y(P) = 0 (and P has order 2) or else y{P) ^ 0 and y(P)2 divides d = -4A3 - 27B2, the discriminant of the cubic polynomial on the right side of (5.1). 130
1. OVERVIEW 131 This theorem will be proved in §4. As was mentioned in Chapter I, a consequence is that we can determine E(Q)tors completely for any particular curve. The algorithm is to put the curve in the form (5.1), to consider each square divisor y2 of d to give a candidate for y{P) and to write down any corresponding integers x such that (x,y) is an integer solution of (5.1). There can be only finitely many such, and we obtain a bound on |i?(Q)torsl- F°r eacn sucn integer solution (x,y), we can raise it to powers up to our bound on the order, effectively checking whether it is a torsion point. As a practical matter, this algorithm is often a little tedious. It is usually more efficient to use (c) to cut down the possibilities. Some examples will illlustrate. Example 1. y2 -f y = x3 — x2. There are some obvious solutions with integer coordinates, namely (x,y) = (0,0), (0,-1), (1,0), (1,-1). (5.2) Under the doubling formula (3.74), x = 0 doubles to x = 1 and vice versa. So all four points are torsion points. For this curve, A = —11 from Table 3.1. Rather than clear out the y and x2 terms to be able to use Theorem 5.Id, we reduce modulo 2, since ai = 0 and 2 \ A. Then £2(^2) contains the four points (5.2) and also 00; hence £2(^2) = Z5. By Theorem 5.1c, r2 : £(Q)tors -> ^(Z2) = Z5 is one-one. Since (5.2) has shown i?(Q)tors to be nontrivial, we conclude that i?(Q)tors — Z5. The members of E^Q^ors are 00 and the points in (5.2). EXAMPLE 2. y2 + y = x3 — x. Again there are some apparent solutions with integer coordinates, namely (*,y) = (±1,0), (±1,-1), (2,2), (2,-3), (6,-15). (5.3) From Table 3.2, we have A = 37. Since ax = 0 and 2 f A and 3 f A, Theorem 5.1c applies to p = 2 and p = 3. Over Z2 we find 5 solutions (counting 00), while over Z3 we find 7 solutions. So £'(Q)tors maps one- one into Z5 and also one-one into Z7. Thus £(Q)tors = 0. None of the points (5.3) is a torsion point. Example 3. y2 — xy + 2y = x3 ± 2x2. For this curve we calculate that A = -2713. We can try to identify £(Q)tors by using parts (a) and (d) of Theorem 5.1. To find a full list of candidates for torsion points, we can rewrite E as [y + (l-is)]2 = a:3 + f*2-*+l.
132 V: TORSION SUBGROUP OF £(Q) Making admissible changes of variables three times leads first to 3/2 = x3+fx2-x+l, then to y2 = x3 + 9x2 - 16x -f 64, and finally to y2 = x3 - 43x + 166. The discriminant of the cubic polynomial on the right side is —21313. Thus Theorem 5.1d says that y(P)2 | 21313. So y(P)2 = 1 or 4 or 16 or 256 or 1024 or 4096. Checking each case, we are led to the candidates (3,±8),(-5,±16),(ll,±32),oo. Using the doubling formula, we can check that each of these is a torsion point. We can transform back to obtain the following solutions of the original equation: (0,0), (0,-2), (-2,0), (-2,-4), (2,4), (2,-4), oo. (5.4) The torsion subgroup of the original equation therefore consists of the points (5.4) and is isomorphic to Z7. It is more efficient, however, to use Theorem 5.1c. Since 3 f A, we list the solutions of the original equation modulo 3 as (0,0), (0,1), (1,0), (1,-1), (-1,1), (-1,-1), co. By Theorem 5.1c, 2?(Q)tors is 0 or Z7. In Q we quickly see that (0,0), (0, —2), and (—2,0) are solutions. Using the doubling formula, we check that the doubled x coordinate of any of these makes a cycle after 3 steps. Therefore these points are torsion points, and 2£(Q)tors — Z7. We can use the results of the doubling to generate the list (5.4). Example 4. y2 + xy = x3 + Ax2 + x. For this curve, A = 3252. The point (—^,^) lies on the curve, and the tangent line is vertical there (since the coefficient 2y + x of dy is 0). Therefore (—\> |) has order 2 and is a torsion point with nonintegral coordinates. To limit the size of E(fy)torSi we reduce modulo 7 and find that the numbers of j/'s giving solutions for x — 0,1,...,6 are 1,2,0,1,0,1,2, respectively. Counting the point 00, we see that £7(^7) has order 8. To identify E{Q)tOTS, we replace y -f \x by y and then scale. The transformed equation is y2 = z3+ 17x2 + IQx. In §5 we shall see that an
1. OVERVIEW 133 equation of the type y2 = x(x + r2){x + s2) has Z4 0 Z2 as a subgroup of its torsion group. So in our case, the torsion group is exactly Z40Z2. Taking the 8 solutions given in §5 and transforming back, we obtain (0,0), (-$,£), (-1,-1), (-1.2), (1,2), (1,-3), (-4,2), 00 as the torsion elements for our original equation. Since 2 f A, reduction modulo 2 is a homomorphism. The group £2(^2) consists of (0,0), (1,0), (1,1), 00, and r2 maps E(Q>)t0rs onto £2(^2) with kernel 22 = {( —^, |), 00}. Table 5.1 gives examples of elliptic curves over Q with 15 different torsion subgroups. Methods for generating such examples will be indicated in §5. According to Theorem 1.7 (due to Mazur), these 15 groups are the only possibilities for a torsion group over Q. The proof of Mazur's Theorem is well beyond the scope of this book and will not be given here. E y2 = s3 + 2 y2 = x3 + X y2 = x3 + 4 y2 = x3 + 4x y2 + y = x3 - i2 y2 = x3 + 1 y2 - xy + 2y = x3 + 1x2 y2 + Ixy - 6y = i3 - 6x2 y2 + 3ry + 6y = x3 + 6ar2 y2 - 7iy - 36y = a;3 - y2 + 43zy - 210y = i3 - y2 = i3 - x 18*2 -210r2 y2 = X3 + 5X2 + AX y2 + bxy - 6y = i3 - 3x2 y2 = g3 + 337r2 + 20736a; £(Q)ton» 0 z2 z3 z4 z5 z6 z7 z8 z9 Zio Zl2 z2ez2 z4©z2 z6ez2 z8©z2 A -2633 -26 -2833 -212 -11 -2433 -2713 283417 -2935 -25310112 21236537413 26 2832 223652 220385472 Table 5.1. Examples of torsion subgroups of £(Q)
134 V: TORSION SUBGROUP OF E(Q) There are two situations where we can decide E(Q)t0rs for infinitely many given curves at once (other than the situations in §5 where we construct some families just for this decidability). The two theorems that follow use Dirichlet's Theorem on primes in arithmetic progressions and will be proved in §6. Part of the first theorem was borrowed in Chapter IV and stated without proof as Lemma 4.20. Theorem 5.2. Let E be the elliptic curve y2 = x3 -f Ax with A in Z and with A assumed fourth-power free. Then if — A is a square in Z E(Q)t0rs={ Z4 ifA = 4 otherwise. Theorem 5.3. Let E be the elliptic curve y2 = x3 + B with B in Z and with B assumed sixth-power free. Then !Z6 if B = 1 Z3 if B = -432 = -2433, or if B = [J and B ± 1 Z2 if B = cube and B ? 1 0 otherwise. 2. Reduction Modulo p We recall the definition of the p-adic norm on Q, p being a prime. If r ^ 0 is in Q, we write r = pnu/v with u and v in Z and with GCD(t/,p) := GCD(v,p) = 1. The definition of the p-adic norm is \r\p = p""n. By convention, we define |0|p = 0. This notion has the following properties: (i) \r + s\p < max{|r|p, |5|p}, with equality if \r\p ^ \s\p, (ii) \rs\p - \r\p\s\p. Property (ii) is clear. For (i) we write r = pnu/v and s = pn u'/v\ with n <n' without loss of generality. Writing \v v' J vv' with GCD(vt/,p) = 1, we obtain (i) directly. Property (i) is called the ultrametric inequality. It implies lr + s\p < \r\P + kip- If we define d(xyy) = \x - y\p} then the latter inequality implies the triangle inequality for dy and d is a metric on Q.
2. REDUCTION MODULO p 135 We say that r e Q is p-integral if \r\p < 1. By (i) and (ii), the p- integral elements form a subring of Q containing Z. Those with \r\p < 1 form an ideal in this subring; they of course have \r\p < p"1. The p-integral elements of Q can be reduced modulo p: If r = pnu/v is p-integral, i.e., if n > 0, then we define rp(r) G Zp by , x f u/v mod p if n = 0 if n > 0. Then rp : {p-integral elements} —► 2p is a ring homomorphism. In preparation for considering plane curves, we can try to use rp to get a map of the affine plane over Q to the affine plane over Zp, but the best we get is a map defined on {(r)5) I r and $ are p-integral} as rpfas) = (rp(r))rp(5))- To correct this deficiency we work with curves projectively, as follows. To define rp : P2(Q) — ^(Zp), we let rp(x,yiw) = (rp(x),rp(y),rp(ti;)), (5.5) where (x,y, w) are coordinates of the point in question chosen so that x, y, and w all have | • |p < 1 and at least one of them has | • |p = 1. Such a representative of a point in P2(Q) is said to be p-reduced. Note that if a general (x, y, w) is given, we can multiply by a suitable pn to obtain a p-reduced representative. A p-reduced representative is unique up to a factor with | • |p = 1. Therefore rp is well defined as a map of all of F2(Q) into P2(ZP). Using (5.5), we can reduce projective plane curves modulo p. Let F G Q[x,y,ti;]m be a plane curve of degree m. Multiplying the coefficients of F by a constant, we may assume that all the coefficients have | • |p < 1 and at least one has | • |p = 1. Then we can reduce the coefficients modulo p, obtaining a nonzero polynomial Fp £ Zp[x, y, w]m. Although Fp is not defined uniquely, it is defined uniquely up to a nonzero scalar. Therefore its zero locus Fp(lp) is well defined. Proposition 5.4. Let F G Q[z,y,ti;]m be a plane curve. Under the reduction homomorphism rp : P2(Q) —> ^(Zp) given in (5.5), the image of F(Q) is contained in Fp(Zp).
136 V: TORSION SUBGROUP OF £?(Q) PROOF. Normalize the coefficients of F as above, and let (x, y, w) be a p-reduced representative of a point in ACQ)- Then (x,y,w) G F(Q) ■$=> F(x,y,w) = 0 =>rp(F(x,y,w)) = 0 *=>Fp((rp(x),rp{y),rp(w)) = 0 <=>Fp(rp(x,y,w))=Q <=^rp(x,y,w)£Fp(lp). Proposition 5.5. Suppose F € Q[x,y, u;]m is a plane curve, L 6 Q[x,y,u;]l is a line, and P = (a:o.yo.u>o) is a point on L. If Fp and Lp are reductions of F and L modulo p, then the intersection multiplicities satisi v i(P,L,F)<i(rp(P),Lp,Fp). (5.6) Proof. Without loss of generality, we may assume that (xo,2/o,^o) is a p-reduced representative and that the coefficients of F and L are normalized as they are supposed to be. Choose a p-reduced representative (xfiyfiwf) of a point P' of L with [(x',y',u/)] ^ [(x0, j/o, u>o)L and form V>(<) = F(P + <^') = ^(xo + **', yo + *i/, wq + tw) = trF; + ... + tmFU, with F'r / 0. By Proposition 2.9 the left side of (5.6) equals r. Recomputing rp(t) modulo p (i.e., in 2p[t]) and applying Proposition 2.9 again, we see that the right side of (5.6) is > r. Let us apply Propositions 5.4 and 5.5 to elliptic curves E over Q. For studying E(Q), we may make an admissible change of variables y —* J//ti3, x —► x/u2 to make all coefficients of E be in Z. Then we can assume E is in Weierstrass form (3.23) with all coefficients in Z. To apply the above theory, we consider the projective form (3.23a) of E. The coefficients of E are all in Z and hence are p-integral. Also wy7 and x3 have coefficient 1. Thus passage to Ep is given simply by writing (3.23a) with coefficients considered in Zp; no preliminary normalization is needed. The discriminant of Ep is clearly given by Ap = A mod p. Thus Ep is nonsingular if and only if p f A. Our reduction map on 2?(Q) is a mapping rp : £?(Q) - £P(ZP) (5.7) by Proposition 5.4.
3. p-ADIC FILTRATION 137 Proposition 5.6. If Ep is nonsingular, then the map rp in (5.7) is a group homomorphism. Proof. Since rp(0,l,0) = (0,1,0), rp carries O to Op. We apply Proposition 5.5. Since the sum of intersection multiplicities over a line is < 3 (by nonsingularity of E and Ep)7 the proposition gives rp(PQ) = r,{P)-rP(Q). Thus rp(P + Q) = rp(0 ■ PQ) = rp(0) ■ rp(PQ) = rp(0) • (rp(P) • rp(Q)) = Op ■ (rp(P) ■ rp(Q)) = rp(P) + rp(Q), and rp is a group homomorphism. 3. p-adic Filtration Fix an elliptic curve E with Z coefficients. In order to get at the integrality or almost integrality of torsion points, it is natural to work with coordinates (x,y,l), clear denominators, and see what happens. The difficulty with this approach is that it is not so easy to take advantage of the hypothesis that (x, y, 1) is a torsion point. The clever idea that works is to use coordinates (X, 1, W). Proposition 5.6 shows that rp : E(Q) —> Ep(Zp) is a group homomorphism if p \ A. In the coordinates (X} 1,W), subgroups and the homomorphism property of rp play a more visible role, and the finite order of a torsion point becomes a usable property. When p \ A, the kernel of rp is {(x,y,w) £ E(Q) | rp(x,y,w) = (0,1,0)}. For such a point (x,y, w)y we must have y ^ 0. Normalizing, we may assume y = 1. The elements (x,l,w) 6 £(Q) that are in ker(rp) are those with |x|p < 1 and \w\p < 1. In this section we shall study the same set ^J)(Q) = {(*, l.u,) € £(Q) I \x\p < 1 and |t»|p < 1}, but without the assumption that p \ A. Lemma 5.7. Let (x, l,tu) be in £(Q). If \w\p < 1, then (x^ < 1 and K = |x|p3-
138 V: TORSION SUBGROUP OF K(Q) Proof. Putting y = 1 in (3.23a), we have w = — a\xw — a^iv2 -f- x3 + a2x2w -f- a4Xti/2 •+- a6™3. (5-8) First suppose |x|p > 1. On the right side of (5.8) the x3 term is strictly the largest relative to | • |p, since | - aixw\p < \xw\p < \x\p < \x\3p = |x3|p l-^2|p<K<i<W? = k3lp \a2x2w\p<\x2w]p<\x\2p<\x\3 = \x3\p \aAxw\ < \xw\ < \x\p < \x\3p = |x3|p \a6w\ < \w\3 < 1 < |x|3 = \x\. By the ultrametric inequality, (5.8) gives \w\p = |x3|p. This is a contradiction, since |u;|p < 1 and \x3\p = \x\3 > 1. We conclude that |x|p < 1. Now we rewrite (5.8) as w 4- aixw + a3w2 — a2x2w — a^xw2 — a^w3 = x3. (5.9) If w = 0, then x = 0 and \w\p = |x|3. If w ^ 0, then the w term is strictly the largest on the left side of (5.9), since I - a\xw\p < \xw\p = |x|p|u;|p < MP \-azw2\p<\w2\p = \w\2p<\w\p \a2x2w\p < \x2w\p = \x\2p\w\p < \w\p \a4xw2\p < \xw2\p = \x\p\w\2p < \w\p \a6w3\p < \w3\p = H3 < \w\p. By the ultrametric inequality, (5.9) gives \w\p = \x3\p lemma follows. For n > 1, define £(n)(Q) = {(*> 1, u>) € E(Q) | \w\p < 1 and |x|p < p-»}. The p-adic filtration of E^(Q) is oo £(1)(Q) 2 £(2)(Q)) 2 £(3)(Q) 2 • • • with p| £<»>(Q) - {(o, 1,0)}. n = l |x|3, and the
3. p-ADIC FILTRATION 139 The important control on a point of ^^(Q) other than the group identity is that it lies in £,(n)(Q) - E<n+1>(Q) for some n. Lemma 5.7 says that £(n)(Q) = {(*, 1,«0 € E(Q) | |W|P < p-*»}. Let R be the ring of p-integral elements of Q, i.e., members of Q with no factor of p in the denominator. Proposition 5.8. The subsets £?(n>(Q) of j?(Q) are subgroups. The function P = (x.l.w) —► z(P) = x gives a map £<n)(Q) -► pnfl. The composition of this map, followed by the quotient map pnR —> pnR/p2nR, is a group homomorphism £(n)(Q) — PnR/p2nR whose kernel is contained in 2?(2fl)(Q). Consequently the homomorphism E(n\Q)/E(2n\Q) —+ pnR/p2nR is one-one. Before coming to the proof, we mention that pnR/p2nR ~ pnl/p2nl *? Z,». (5.10) For the second of these isomorphisms, we observe that each group is cyclic and they both have pn elements. For the first of the isomorphisms, we map pnZ into pnR/p2nR and obtain a map on the quotient pnZ/p2nZ —>pnR/p2nR. This map is one-one since pnZ flp2n-R C p2nZ. To see that it is onto, let pnu/v be in pnR. Choose a and b in Z with av -f bpn = u. Then pnu n p2nb L— = p"a + £— v v shows that pna maps onto the coset oipnu/v. The proof of Proposition 5.8 will be preceded by the statement of a lemma. The lemma will be proved after the proposition.
140 V: TORSION SUBGROUP OF E(Q) Lemma 5.9. Let L be a line that meets i?(Q) in three points when multiplicities are counted, say in P\ = (xi,l,tui), Pi = (x2,l,u;2), an(* P3 = (x3,1, m)- If Pi and P2 are in £(n)(Q), then P3 is in £(n)(Q) and I*i+X2 + *»U<(P:£ ifai = ° (5.11) ix * oip _ |^p ^n m any case. Proof of Proposition 5.8. If Pi and P2 are in /^(Q), then the lemma says that P3 = PiP2 is in £<n>(Q). Since O is in £<n>(Q), so is O • PiP2 = Pi + P2. Also O Pi = -Pi is in ^n>(Q). Hence JE?<n>(Q) is a subgroup of #(Q). If P is in £(n)(Q) and x = x(P), then |x|p < p"n yields |p-n*|P = b"%k|p=Ppil*lp<l- Hence p~nx is in R and x is in pniE. Thus P —> x(P) gives a map of £(n)(Q)intopnfl. Let PiP2 = P3. Then the lemma gives x(Pi) + x(P2) 4- x(P3) € p2n/?. (5.12) If P3 = (X3,1,^3), then a little computation with (3.70) shows that op3 = -p3 = (-J—2- , i, -t-t^t ) • \ I +axx3 + a3w3 1-f aix3 + 031^3/ Hence x(Pi+P2) = x(OP3) = -: X3 1 -faix3 + a3w;3 and x(Pi + P2) + x{P3) = ~ — ^- + x3 1 + aix3 + a3iy3 aix3 + a3ti;3 = X3 1 -f aix3 -f a3ti;3 6 p3nR (5.13) since |aix3 + 031^3^ < 1 and |1 + aix3 4- 03^3^ = 1. Combining (5.12) and (5.13), we see that x(Pi + P2) = x{Px) + x(P2) mod p2ntf, and the composition P —► x(P) —► x(P) +p2nR is a homomorphism. If P = (x,l,ti;) is in the kernel of the resulting homomorphism on £(n)(Q)/£(2")(Q), then x(P) = x is in p2nfl and |ur|p < 1. Then \Ap < P"2n> and P is in £<2n)(Q). This completes the proof.
3. p-ADIC FILTRATION 141 Proof of Lemma 5.9. The first part of the proof will show that the line L is of the form w = rnx + 6 (5-14) and will give a bound for the slope m. We begin with the equation w -f a\wx -f azw2 = x3 -f a2x2w + a^xw2 + aew3 (5.15) satisfied by the three points. Case 1. Pi ^ P2. Here we substitute (xi,l,wi) and (x2,l,ttf2) into (5.15) and subtract the results. Each term of the difference is an integral multiple of x\w\ — x2w2 = (x{ - x2)w\ + x2(w\ "" ^2) = (xx - x2)(*r1 + • • • + *2~~l)w\ + (m - ™2)K-1 + • • • + wi"1)*!- (5.16) Consider all the terms of (5.15) but w and x3, all of which have 5+3/ > 4. Noting that Pi and P2 are in f?(n)(Q), we use K*i-*a)(*r1 + -"+*rViip < |u - *2|pp-B(-V3"1 < P_3"ki - *2|p when s > 0, and we use \(wx - w2){w[~l + • • • + w2-l)xs2\p < K - ^2|PP"3n(<" Vn* < p"nK - ^2|p when t > 0. Moving the Xi — x2 terms obtained from (5.16) to the right side and the W\ — w2 terms to the left side, we can rewrite our difference of equations as (wi - u/2)(l + u) = (xx - x2){x\ + xix2 + x\ + v), (5.17) where |u|p < *rn and \v\p < p~3n. Since |ti|p < 1, |1 + u\p = 1. In particular, xi = x2 implies w\ = w2. Since we have assumed Pi ^ P2, we conclude xi ^ x2. Thus the line through Pi and P2 is of the form (5.14). In view of (5.17), the slope m is given by _ W\ - W2 __ x\ + XiX2 + X2+V X\ - X2 1 + li
142 Hence Mp = V: TORSION SUBG x\ + X!X2 + x\ + v\ 1 +u = \x\ + xlx2 + xl + v\p < max{p-2n, p-2n, p-2n, p-3"} = p~2n. (5.18) Case 2. Pi = P2. The line in question is the tangent, which we know exists. Since Q C R, we can compute the tangent by implicit differentiation of (5.15). Differentiation of (5.15) and substitution of (x,l,tu) = (xi,l,t/>i) gives (1 -f aiXi + 2a3wi — a2x2 — 2a4xiwi — 3aGw2)dw = (—aiwi + 3x2 + 2a2x\wi + aAw\)dx. dw The coefficient of dw is of the form 1 -f u' with |u'L < 1. Therefore -— ax is finite. The tangent line is thus of the form (5.14), and the slope m is , -aiwi +3x? + 2a2xiwi + a4w2 given by m = . Hence 6 J 1 + u' \m\p = | — a\Wi + 3x2 + 2a2xiti>i + a4u>2|p < maxjp , p , p , p ) <P"2n. (5.19) Both cases. Thus in both cases, (5.18) and (5.19) say \m\p<p-2n. (5.20) Since Wi = mxi -f &, we have b = wi — m«i, and (5.20) gives |6|„ < p-3n. (5.21) The three points Pi,P2,Pa satisfy (5.14) and (5.15) with multiplicities. Substitution of (5.14) into (5.15) gives a cubic equation (mx + 6) + aix(mx + 6) -f a3(mx -f 6)2 = x3 + d2X2(mx -f 6) -f a4x(mx + &)2 + a6(mx + 6)3 whose roots, with multiplicities, are xi,x2,X3. Then xt + x2 -f X3 must be minus the quotient of the coefficient of x2 by the coefficient of x3, and we obtain —aim — a^m2 + a26 + 2a4m6 + 3a6m26 X) + X2 + X3 = — ; — 3 . 1 + Q2m -f (14m2 + ^6^
3. p-ADIC FILTRATION 143 By (5.20) the denominator has norm 1. Thus (5.20) and (5.21) show that |*i +*2 + x3|p < Tmx{\ai\pp'2nt p~3n} < [PlTn !f Ql =° 1 IF — u iy r J — [p zn in any case, as required. Since |xi| < p~~n and |x2|p < p~n, we deduce that \x3\p < p~~n. From tu3 = mx3 + 6, we obtain \w$\p < p~3n by using (5.20) and (5.21). Thus P3 is in £<n)(Q). This proves the lemma. REMARK. The proof of Proposition 5.8 did not make use of the full strength of the lemma since it did not isolate the case ai = 0. When ai = 0, one can go over the proof of the proposition to see that p2n can be replaced by p3n and E^2n^ can be replaced by E^3n\ In particular, ai = 0 implies that there is a well defined map £(n)(Q)/£(3n)(Q) — pnR/p3nR, (5.22) and it is a one-one homomorphism. We shall use this observation critically in the next proposition. Proposition 5.10. For each odd prime p, £(Q)t0rs n £(1)(Q) = 0. This conclusion extends to p = 2 if ai =0. Proof. Fix p. First assume aY = 0. If E(Q)tors n £(1)(Q) ^ 0, the intersection contains a nonzero element P of some prime order q. Since fC=i £(n)(Q) = {O}, we can find n such that P is in £(n)(Q) but not £(tt-H)(Q)# From qP = 0 and the homomorphism property of the map in Proposition 5.8 (as amended by (5.22)), we see that qx(P) is in p3nR. (5.23) If q / p, then x(P) must be in p3nR C p2nR, while if q = p, then x(P) must be in p3n~lR C p2nR. In either case the one-oneness in Proposition 5.8 implies P G £(2n)(Q) C £<n+1>(Q), contradiction. Now we allow a\ / 0 but assume p is odd. Suppose by way of contradiction that P = (x, 1, w) is a nonzero element of £(Q)tors H /^(Q) for the equation (5.15). Then P is a torsion point with |x|p < 1 and \w\p < 1. Also w / 0. Write [(z,l,w)] = [{z,y,l)]
144 V: TORSION SUBGROUP OF E(Q) X _ 1 by putting x = — and y = —. Then (x,y, 1) is a torsion point with w w y2 + a\xy + a3y = x3 + a2x2 + a4x + a6. (5.24) We make an admissible change of variables, putting x = -x' and y - -Ay' - a\xf). Substituting into (5.24), we see that (x,)y,) 1) is a torsion point of yf2 + 8a3y' = x'3 + (4a2 + a?)x'2 + (16a4 + Sa^x1 + 64a6. The point (x',l,'ti/) with [(x',1,^')] = [(*',y',l)] has x; = — and y; = —, and it is a torsion point of wf w' + 8a3u/2 = x/3 + (4a2 + aj)x/2w;' + (16a4 + Saxa3)xfwf2 + 64a6u;/3. (5.25) This equation has integer coefficients, and its x'w' term has coefficient 0. From the previous paragraph it has J?(Q)to™n£?(1)(Q) = 0. We shall obtain a contradiction by showing that (x;, 1/ w') is in this intersection. In fact, we know (x;, l,u/) is a torsion point. Also w 8 + 4a!X and p odd and \x\p < 1 imply |8 -f 4aix|p = 1, so that \w'\p = \w\p < p-3. By Lemma 5.7, |x'|p < p"1. Thus (x',1,1//) is in ^(1)(Q). This contradiction completes the proof. 4. Lutz-Nagell Theorem In this section we shall prove Theorem 5.1. The notation is as in §1. Proof of Theorem 5.1a. Under the assumption that ai = 0 let (x,y, 1) be in i£(Q)tors- We are to prove that x and y are in Z. First we handle y. Without loss of generality, we may suppose y ^ 0. Then [(x,y, 1)] = [(X.ljW)], where X = x/y and W = 1/y. Fix any prime p. By Proposition 5.10, the torsion point (X, 1,W) is not in E^\Q) By Lemma 5.7, \W\P > 1. Thus \y\p = W < 1. Since this condition p holds for all p, y is an integer. Regarding y as an integer constant in (3.23b), we see that x satisfies a monic cubic polynomial equation with integer coefficients. For such an equation, the rational roots are integers. Thus x is an integer.
5. CURVES WITH PRESCRIBED TORSION 145 PROOF OF THEOREM 5.1b. Without the assumption that ai - 0, let (x,y, 1) be in E(Q)ton. Then (x',y', 1) = (4x,8y, 1) satisfies y'2 + 2a1x'y' + 23a3y' = x'3 + 22a2x'2 + 24a4x' + 26a6, and (iV', 1) = (x',y' + a!x', 1) satisfies y"2 + 8a3y" = x"3 + (4a2 + a2)x"2 + (16a4 + 8a1a3)x// + 64a6. By Theorem 5.1a, x" and y" are integers. Tracing back, we see that x' and y1 are integers. Hence 4x and 8y are integers. Proof of Theorem 5.1c. When p \ A, we have ker(rp) = /^(Q). Under our hypotheses, Proposition 5.10 says that £(Q)torsn£(1)(Q) = 0. Thus the restriction of rp to £7(Q)tors is one-one. Proof of Theorem 5.Id. Let E be of the form (5.1). The doubling formula (3.74) gives _ x*-2Ax*-8Bx + A> *Kn ~ 4(x3 + Ax + B) ' \/{x\ and we write this as x(2P) = ttt4-- Since 6(x) = y2, we have 4o(x) i/(x) = 4y2x(2P). (5.26) Now x, x(2P), and y2 are integers by Theorem 5.1a, and we see from (5.26) that i/(x) is an integer and that y2 | J/(x). Thus y2 divides both i/(x) and S(x). Direct calculation gives (3x2 + \A)u{x) - (3x3 - hAx - 27B)6(x) = 4A3 + 27£2 = -d. Since y2 divides both v(x) and £(x), y2 divides d. 5. Construction of Curves with Prescribed Torsion In this section we study how to construct elliptic curves E over Q with prescribed features in 2?(Q)tors- The first part of the section shows how to produce elements of a given order. Then we shall discuss how to make ^2n 0 Z2 be a subgroup of 2?(Q)tors- Finally we summarize matters by saying how Table 5.1 in §1 was constructed.
146 V: TORSION SUBGROUP OF E(Q) In studying behavior at a single point, we may translate the curve to make the special point be P = (0,0) in the affine plane. After this translation, we have a$ = 0, and the curve is y2 + aixy + a3y = x3 + a2x2 + a4x. (5.27) From §111.5 the curve is singular if and only if a3 = a4 — 0. Assume that (5.27) is nonsingular. Then P = (0,0) is of order 2 if and only if the tangent is vertical there. Vertical tangency occurs when, in the implicitly differentiated form of the curve, the coefficient of dy is 0. In (5.27), dy has coefficient 2y + a\x + a3. Thus P has order 2 if and only if a3 = 0 (and therefore a4 / 0). Assume next that (5.27) is nonsingular and that as ^ 0. If we make the admissible change of variables (x,y) = (x/,2// + a3"1a4x/) in (5.27), then P remains at (0,0) and we are led to y'2 + (ai + 2a3"1a4)xV + a3y' = x/3 + (a2 - aya^la4 - a^2a\)x'2. Changing notation, we can rewrite this as y2 + axxy + a3y = x3 + a2x2 (5.28) with no x term. With P = (0,0), the numbered formulas of §111.4 give -p = (0,-a3) (5.29a) and 2P = (-^, a^ - a3) = (~a2, axa2 - a3). (5.29b) Since 3P = O if and only if -P = 2P, we see that P = (0,0) in (5.28) has order 3 if and only if a2 = 0. Now assume that we have arrived at a nonsingular (5.28), i.e., a3 ^ 0, and suppose a2 ^ 0. Then we can eliminate one of the parameters. Namely if we make the admissible change of variables (x,y) = (x7«2,y7«3) with u = a^la2, then P remains at (0, 0) and we are led to y1 + a^axa2xy + <q2a\y = x3 + a^2a\x2
5. CURVES WITH PRESCRIBED TORSION 147 with equal coefficients for the y and x2 terms. Changing notation, we can rewrite this as E(b,c) : y2 + (l-c)xy-by=x3-bx2. This is called the Tate normal form. It is nonsingular if and only if 6 = 0. Direct calculation shows that the discriminant is A(6,c) = (l-c)463-(l-c)363-8(l-c)264 + 36(l-c)64-2764+1665. From (5.29) and (3.70), we obtain ^ = (0,0) -P = (0,6) 2P = (6,6c) -2P = (6,0). (5.30a) Using (3.72) and (3.73), and then (3.70), we find 3P = (c, 6 - c) -3P = (c, c2). (5.30b) From (5.30) we can easily choose 6 and c to make P = (0,0) have AP = O or 5P = O or 6P = O. To achieve these three conditions, we just make 2P = -2P or 3P = -2P or 3P = -3P. From (5.30) we obtain immediately 4P = 0 <=> c = 0 5P = 0 <=> b = c (5.31) 6P = O <=> 6 = c2 + c. Let us turn to the question how to make 7.2n 0 Z2 be a subgroup of £(Q)tors- The tool will be Theorem 4.2. Since Z2 0 Z2 will have to be a subgroup, we can assume from the outset that E is of the form y2 = (x - a)(x - p)(x - 7) with <*,/?,7 in Z. Translating (7,0) to (0,0) and changing notation, we can write the curve as y2 = x(x-a)(x-p). (5.32) For Z4 0 Z2 to be contained in E(Q)torSl one of the points (zo,0) with xo 6 {0,a,/?} must be the double of something. Suppose (0,0) is the point that is to be a double. Theorem 4.2 says that the necessary and
148 V: TORSION SUBGROUP OF K(Q) sufficient condition is that 0 — 0, 0 — a, and 0 — 0 are squares. Thus any curve y2 = x(x + r2)(x + 52) has Z4 0 Z2 C i£(Q)tors- The subgroup Z4 0 Z2 consists of the elements (0,0), (-r2,0), (-52,0), (rs, ±rs(r + 5)), (-rs} ±rs{r-s)), 00. This construction was used in connection with Example 4 in §1. The application of Theorem 4.2 can be repeated, this time to be used on (rs,rs(r + 5)), and the result is a condition for Zs 0 Z2 to be a subgroup of E(Q)tors- To summarize matters, let us say how Table 5.1 in §1 was constructed. For as many cases as possible, the examples are from Theorems 5.2 and 5.3. For Z5 we used (5.31), and for Z7,Zs,Z9 we used extensions of (5.31) that deal with IP = O, 8P = O, 9P = O. For Z10, we started with the standard form with P = (0,0) of order 5 and found a case where P was the output from the doubling formula (3.74). For Z12, the same technique in principle should work with P = (0,0) of order 6, but we simply quoted a known example. The examples with Z40Z2 and Zg0Z2 were obtained by the method in the previous two paragraphs, and the example for Z6 0 Z2 was obtained by starting with the form (5.31) for 6P = 0 and looking for a case that had three elements of order 2. 6. Torsion Groups for Special Curves Our objective in this section is to prove Theorems 5.2 and 5.3 in §1. Each theorem requires a lemma about Ep(lp) for certain primes p. Also we shall use the form of the doubling formula (3.74) when specialized to a curve y2 = x3 + Ax -f B: _ X<-2Axi-8Bx + A> X{2n~ 4(x* + Ax + B) ■ (5,33) And we shall use Dirichlet's Theorem that there are infinitely many (positive) primes an + b if GCD(a,6) = 1; this theorem will be proved in Chapter VII. Lemma 5.11. Let Ep be the curve y2 = x3 + Ax over Zp, and assume that p f A, p > 7, and p = 3 mod 4. Then Ep(lp) has exactly p + 1 points.
6. TORSION GROUPS FOR SPECIAL CURVES 149 PROOF. We start from the known result that p = 3 mod 4, implies that —1 is not a square modulo p. For x -£ 0, consider the pair {x, — x}. When these elements are substituted into E, we obtain x3 -f Ax and —(x3 + Ax). If the answers are 0, each one has a square root, and we get one solution from each. If they are nonzero, exactly one is a square (since — 1 is not a square), and it has two square roots y. So in either case, the pair {x, —x} gives us two solutions. Thus the nonzero x's give us p— 1 solutions in all. For x = 0, we get one more solution (0,0), and oo gives us one additional solution. Thus Ep(2p) has p + 1 points. Proof of Theorem 5.2. The main step is to show that |#(Q)tors| divides 4. By Theorem 5.1c, for all sufficiently large primes p, |£(Q)torsl divides \EP(2P)\. By Lemma 5.11, |£(Q)tors| divides p + 1 for all sufficiently large primes p with p = 3 mod 4. Let us see that 8 does not divide |£(Q)torsl- By Dirichlet's Theorem, we can choose a prime p as in the previous sentence with p = 3 mod 8. If 8 divides |£(Q)t0n|. tnen 8 | (p + 1). But p = 3 mod 8 means that p-f 1 = 4 mod 8; so 8 \ (p -f 1), contradiction. Now let us see that 3 does not divide |£(Q)tors|- By Dirichlet's Theorem, we can choose p large with p = 7 mod 12. Then p = 3 mod 4. Thus 3 | |£(Q)tors| implies 3 | (p + 1). But p+ 1 = 8 mod 12 implies p-f 1 = 8 mod 3; so 3 f (p+ 1), contradiction. Finally let us see that no odd prime q > 3 divides |2£(Q)tors|- By Dirichlet's Theorem, we can choose p large with p = 3 mod Aq. Then p = 3 mod 4. Thus q | |£?(Q)t0rs| implies q | (p + 1). But p + 1 = 4 mod Aq implies p + 1 = 4 mod q\ so q \ (p -f- 1), contradiction. This completes the proof that |£(Q)tors| divides 4. The torsion group will then contain Z2 = {(0,0), 00} as a subgroup, and it will be Z2 0 Z2 if and only if x3 -f Ax splits over Q, i.e., if and only if — A is a square. Thus the only question is when (0, 0) is the double of something (so that the torsion group is Z4 rather than Z2). We can check directly for A = 4 that (2,4) doubles to (0,0). Consider the equation 2(x,y) = (0,0) for other A. By (5.33), we have 0 = xA-2Ax2 + A2 = (x2-A)2. Thus x2 = A. Since A is fourth-power free, x is square free. But y2 = x(x2 -f A) = x(x2 -f x2) = 2x3 then shows that no odd prime can divide x. So x = ±1 or ±2. Checking the possibilities, we see that x = ±2 and A = A. Lemma 5.12. Let Ep be the curve y2 = x3 + B over Zp> and assume that p f A, p > 5, and p = 2 mod 3. Then £p(Zp) has exactly p + 1 points.
150 V: TORSION SUBGROUP OF E(Q) Proof. Let p = 3n + 2. The multiplicative group Z* has order p- 1. Since 3 \ (p — 1), no element has order 3. Therefore the homomorphisrn a —► a3 on Z£ is one-one, hence onto. Thus each element of Zp has a unique cube root. For each y in Zp, the element y2 — B has a unique cube root, which we can take as x. In this way we obtain p solutions. Adjoining oo, we see that EP(ZP) has p + 1 points. Proof of Theorem 5.3. The main step is to show that ^(QJtorsI divides 6. By Theorem 5.1c, for all sufficiently large primes p, |£,(Q)tors| divides \Ep(lp)\. By Lemma 5.12, |£(Q)to™| divides p + 1 for all sufficiently large primes p with p = 2 mod 3. Let us see that 4 does not divide |l?(Q)tors|- By Dirichlet's Theorem, we can choose a prime p as in the previous sentence with p = 5 mod 12. If 4 divides |#(Q)tors|, then 4 | (p + 1). But p = 1 mod 4 means that p + 1 = 2 mod 4; so 4 f (p -f 1), contradiction. Now let us see that 9 does not divide |£(Q)tore|- By Dirichlet's Theorem, we can choose p large with p = 2 mod 9. Then p = 2 mod 3. Thus 9 | |£(Q)tors| implies 9 | (p + 1). But p + 1 = 3 mod 9 implies 9 f (p+ 1), contradiction. Finally let us see that no odd prime q > 3 divides |£(Q)tors|- By Dirichlet's Theorem, we can choose p large with p = 2 mod Zq. Then p = 2 mod 3. Thus q | |£(Q)t0rsl implies q \ (p + 1). But p -f 1 = 3 mod 3g implies p + 1 = 3 mod q\ so q f (p+ 1), contradiction. This completes the proof that |£(Q)tors| divides 6. The torsion group has an element of order 2 if and only if x3 + B has a first-degree factor over Q, i.e., if and only if B is a cube. Thus the only question is when the torsion group has elements of order 3. Such a point P = (x,y) is characterized by 2P = — P. Moreover, the x coordinate determines everything, since 2P = -fP is impossible for P ^ O. By (5.33) the question is whether x4 - SBx _ 4(x3 + B)~* has any rational solutions x. Clearing fractions, we have 4x4 + ABx - x4 - SBx x4 = -ABx. One solution is x — 0, which gives y2 = B\ so Z3 occurs if B is a square. The only other possibility is x3 = — 4B. Then y2 = -35. Consequently B < 0. Since 5 is sixth-power free, the only possible prime divisors of B are 2 and 3. We readily find B = -2433. So Z3 occurs if and only if either B is a square or B = —2433.
CHAPTER VI COMPLEX POINTS 1. Overview We shall consider meromorphic functions on C periodic with two periods that are linearly independent over R. Such functions are called elliptic functions. We fix the periods. Among these functions, there is a basic one p(z)% and p(z) satisfies the differential equation p'2 = 4p3 - g2p - 03 for certain constants g2 and #3 depending on the periods. The linear independence of the two generating periods forces the cubic 4x3— g2x—g% to have nonzero discriminant. For t G C in a parallelogram II generated by the periods, we form the parametrically defined curve Then we have y2 = 4x3 - g2x - g3i and the discriminant is nonzero. In other words, p provides us with a parametrization of part of a curve E over C, and E is for all intents and purposes an elliptic curve. Taking into account poles of p(t) and using the projective plane over C rather than the affine plane, we shall see that the parametrization is onto £"(C). In fact, the map is biholomorphic. There is a natural addition on our fundamental parallelogram obtained from addition in C, and we shall see that this operation corresponds to the group law for E(C). Finally we shall see that every elliptic curve over C can be realized by this construction. This is the inversion problem, and detailed motivation for its solution appears in §5. 151
152 VI: COMPLEX POINTS 2. Elliptic Functions A function / : C —► C U {00} is doubly periodic with periods u>\ and u;2 if ^1 and u;2 are linearly independent over R and if f(z -fa;i) = /(z) = f(z +<jj2) for all z G C. An elliptic function is a meromorphic doubly periodic function. Fix u>i and lj2- The corresponding elliptic functions form a field. The parallelogram with vertices 0, u)\, LJ2> w\ +w2 is called the fundamental parallelogram II. To be more precise, we shall insist that II contain the two sides adjacent to the origin, as well as the origin, but not the other two sides and three vertices. Any C-translate a + II of II is a period parallelogram. With our precise definition of period parallelogram, we can conclude the following: For any period parallelogram a + 11, any point in C is congruent modulo Zu\ -f 2lj2 to one and only one point of a+ 11. Proposition 6.1 (First Liouville Theorem). There exists no noncon- stant elliptic function without poles. PROOF. Such a function would have to be bounded on the closure fl, hence entire and bounded on C. Corollary 6.2. If two elliptic functions have the same poles with the same respective principal parts, then the functions differ by a constant. Proposition 6.3 (Second Liouville Theorem). If f(z) is an elliptic function with no poles on the boundary C of a period parallelogram a + Il, then the sum of the residues of f(z) in a + II is zero. PROOF. By the Residue Theorem, the sum in question equals The integrals over opposite sides of a -f II cancel (since / is the same at congruent points and dz introduces a minus sign on one of two matching sides), and thus the integral is 0. Remark. If/ is elliptic, it has only finitely many poles on a bounded set, and thus there is always an a for which the proposition applies. Corollary 6.4. If f(z) is a nonconstant elliptic function, then either / has more than one pole in a + II or else / has one pole and that pole has order > 1. Proof. There has to be a pole, and the sum of the residues is 0.
3. WEIERSTRASS p FUNCTION 153 Corollary 6.5 (Third Liouville Theorem). Suppose f{z) is an elliptic function with no pole or zero on the boundary C of a period parallelogram a + II. Let {rrii} be the orders of the various zeros in a 4- II, and let {rij} be the orders of the various poles. Then ]£,• mt = £• rij. Proof. Since / is meromorphic, the residue of at 20 is 1 n if n > 0 is the order of a zero of / at zq 0 if / has no zero or pole at zq — n if n > 0 is the order of a pole of / at zq. Summing over the points of the period parallelogram and applying Proposition 6.3 to , we obtain the corollary. The order of an elliptic function is the number of poles in a -f II, counting multiplicities. In Corollary 6.5, the order is J2jnj' Consequently the number of zeros of f(z), counting multiplicities, equals the order of f{z). Corollary 6.6. Suppose f(z) is an elliptic function of order m and or + II is a period parallelogram. For each c £ C, / takes on the value c exactly m times, counting multiplicities. Proof. Apply Corollary 6.5 to f(z) — c. 3. Weierstrass p Function Fix lj\ and u>2 in C linearly independent over R, and let II be the fundamental parallelogram. We shall define a particular nonconstant elliptic function with periods u>\ and U2} thereby proving existence of nonconstant elliptic functions. Then we shall develop some properties of this function. Write A = {mu>\ 4- n^2 | ^,w G Z}; A is called the period lattice. The Weierstrass p function relative to A is defined by - — V I 1 - - P(') = j + E 7TT«-^ ■ («•!) u>GA We shall make repeated use of the following fact from complex variable theory: If a sequence of functions analytic in an open set converges uniformly, then the limit is analytic, and limit and derivative may be interchanged.
154 VI: COMPLEX POINTS Lemma 6.7. If 5 is a real number > 2, then converges. Proof. Let ft be the union of the four A translates of II that surround the origin: ft = n u (-lj1 + n) u {-u2 + n) u {-ux - u2 + n). The boundary of ft is compact and does not contain 0, and we can choose c > 0 so that |2| > c for all z in the boundary of ft. If \m\ > \n\ > 0, we then have I ft I \mu\ -f 71CJ2I > |"i| \w\ -\ ^2 > c|ml» (6-2) I m I since z — u>i -f —^2 lies in the boundary of ft. Now consider ]T)u,^o M-*- ^ we write u> = mwj -f nu>2, the terms with n = 0 contribute 2|c^i |""* X)m=i m~* < °°» anc* tne terms w^n m = 0 contribute 2|c^2|~~* 5Z^^Li rl~* < 00. By (6.2), the terms with lml > \n\ > 0 contribute ]T |mu;i H-nu^r* < c"5 X) 1^1"* = 4c"* ^ m",+1 < 00, |m|>|n|>0 |m|>|n|>0 m=l and the terms with \n\ > \m\ > 0 similarly make a finite contribution. The lemma follows. Proposition 6.8. If F is any finite subset of A and if the terms corresponding to F are omitted in (6.1), then the resulting series converges absolutely uniformly on any compact subset of C — (A — F). Consequently p(z) is meromorphic in C, its only poles are double poles at the points of A, and p'(z) may be computed term by term. Proof. We may assume F contains u> = 0. The sum for A — F is *2 VPU-w)2 a/2/ ^ 2z-Z-
3. WEIERSTRASS p FUNCTION 155 and we have C " M3 2z a; a;(z —a/) on our compact set. By Lemma 6.7, X^^ga-f O* ^ °°' Therefore the absolute uniform convergence follows by the Weierstrass M-test. The fact from complex variable theory at the beginning of the section shows that the limit is meromorphic. The poles are as indicated because of the convergence, and the same fact from the beginning of the section shows we can compute p'(z) term by term. Proposition 6.9. The function p(z) is an elliptic function with u\ and o;2 as periods, and p{—z) — p(z). The order of p(z) is 2. PROOF. We have p(—z) = p(z) since the right side of (6.1) is unchanged if z is replaced by — z and u is replaced by — u>. Differentiation of (6.1) gives o;€A V ' and this is certainly doubly periodic. Hence p'{z) is elliptic. For w G A, we differentiate p(z -f u) — p(z) and obtain p'{z + u>) — p'{z) = 0. Therefore p(z + u;) — p(z) = C. Evaluating at z = — \u\ (where there is no pole, according to Proposition 6.8), we obtain c = p(£"i) - p(-£"i) = p(I^i) - p(i^i) = o. Hence p(z -f tt>i) = p(^). Similarly p(z + u>2) = p{2)- Since p is meromorphic, p is elliptic. Since z = 0 is the only pole of p(z) in n, the order of p(z) is 2. Theorem 6.10 Any even elliptic function (relative to periods uj\ and LJ2) is a rational function of p(z). Any elliptic function / is of the form f(z) = g(p(z)) -f p'(z)h(p(z)) with g and h rational. Preliminary remarks. The second statement follows from the first since f = fe + f0 with fe(z) = \{f(z) + /(-*)) even and with f0(z) = f fo \{f{z) — f{—z)) odd, and since f0 — p' • — with —f even. The proof of P P the first statement will be preceded by a lemma.
156 VI: COMPLEX POINTS Lemma 6.11. (a) For any ti € C, the elliptic function p(z) — u has within II either two simple zeros or one double zero. (b) The zeros of p'{z) within II are \uiy \u2, and \{u\ + ^2), all simple. (c) The values u^ - p(^u>i), u2 = p(^2), and u3 = p(^(u>!+u>2)) are the exact u's within II where p(z) — u has a double zero, and «i, 1*2, «3 are distinct. Proof. (a) Since p(z) has order 2, we can apply Corollary 6.6. (b) The function p'(z) has order 3. Hence it has at most 3 zeros. Since p'(z) is odd and periodic, we have f/(iw,) = -p'(-iw,) = -*/(«! " ±«l) = -p'far). Thus ^ux is a zero. Similarly |u>2 and \(w\ +^2) are zeros. (c) If p(z) — u has a zero at zq , the zero is double if and only if p'(zo) = 0. By (b), p(z) — u can have a double zero only at \w\, ^u2l ^(^1 + ^2). and in these cases u clearly must be u\, 1*2, t/3, respectively. Conversely if u = u\, then p(z) — u\ is 0 at z = \wi> and also fp'(\w\) = 0. So we have a double 0. Similar remarks apply to ~u2 and to \{u\ +^2). Finally if at least two of U\y u2i U3 were equal, say to «o, then p(z) — uo would have at least two double zeros, in contradiction to (a); we conclude that w1jw2jw3 are distinct. Proof of Theorem 6.10. We are to prove that any even elliptic function is a rational function of p(z). Let f(z) be even elliptic. Taking into account the evenness, we shall list "half the zeros and poles, temporarily ignoring what happens at z = 0. Let f(a) = 0. First suppose a G II and a £ {0, \u\> ^u>2, \{w\ + u>2)}- Let a* be the symmetric point, the point in II congruent to —a: {u\ -f w2 — a if a is interior (and not \(w\ -f w2)) u>i — a if a is on the side to lj\ (and not \w\) (6.3) u>2 — o> if a is on the side to u2 (and not ^2). If a has order m, so does a*: In fact, /(a* - z) = /(period - a - z) = f(-a - z) - f(a + z). If /(a + z) = amzm+higher, then /(a* + z) = f(a -z) = am(-z)m + higher.
3. WEIERSTRASS p FUNCTION 157 Next suppose a is one of |u>i, |u;2, ^1+0/2). Say a = ^cj. Then we show a has even order. In fact, /(iu, - z) =/(-I*-*) = /(iu, + z) shows / is even about a = £u>. Hence the order is even. A similar argument applies to poles. If / has a pole at a € II with a £ {0, ^a;i, ^u>2, ^(^1 +^2)}) then / has a pole at a* of the same order. Also the order of a pole at any of \w\y \u)2, ^(^i + ^2) is even. Now we list "half the zeros and poles of f(z). Let {a,} be a list of the zeros of f(z) in II other than 0, %u)\, ^u;2, \{u\ + u^), each taken with its multiplicity, but with only one taken from each pair a, a*. For any zero among ^ux, ^u;2j ^(^1 + w2), we list the point half as often as its multiplicity. Similarly let {6;} be a list of "half" the poles in II other than 0. Since all the a, and bj are nonzero, p(at) and p(bj) are finite for all i and j. Thus it makes sense to define „r v _ TIM^) - p(°0) g{)~TlM*)-p(i>i)Y We claim that g has the same zeros and poles as /, counting multiplicities. Since the only poles of the numerator and the denominator are at z = 0, the only other zeros and poles of g(z) come from zeros of the numerator and the denominator. Consider a zero zq of the numerator, a point where p(zq) = p(a,-). If a, is any of ^u)\t ^u>2, or |(cji + 0*2)1 then Lemma 6.11 says p takes on the value p(a;) twice at that point and nowhere else. So z0 = a, and p(z) — p(a») has a zero of order two at z0. Taking into account repeated factors for this a,, we see that / and g have a zero of the same order at Next suppose a, is not ^lji, ^u;2, or \(w\ -f u^). We have p(at*) = p(a;), so that p(z) — p(a,-) has distinct zeros at a,- and a,*. The lemma says these zeros are simple. Taking into account repeated factors for this a,-, we see that / and g have a zero of the same order at z0. Thus / and g have the same zeros (including their orders), apart from z = 0. Similarly they have the same poles (including their orders), apart from z = 0. By Corollary 6.5, they have the same order of zero or pole at z = 0. Consequently f/g is entire and therefore constant, by Proposition 6.1. This completes the proof.
158 VI: COMPLEX POINTS Theorem 6.12. The function w = p(z) satisfies the differential equation (*)- 4w3 - g2w - g3, where g2 and #3 are constants depending on A in the following way: Gm = Gm(A) = Yl Az for m > 3 (=0 for m odd) (6.4) <*>GA 92 = 172(A) = 6OG4 (73 = (73(A) = 140G6. l ' ' Remarks. The series for Gm is absolutely convergent by Lemma 6.7. The theorem will be proved after Lemma 6.13. Lemma 6.13 (Weierstrass Double Series Theorem). Suppose fn{z) = Z)jbLoafc (z "~ zo)fc *s analytic for \z — z0\ < r and n > 0, and suppose that the series n=0 + [a(0x) + flW(r - *0) + 41 \z - *0)2 + • • •] + ... + [«(on) + «(in)(^ - *o) + a2n\z - z0)2 + ...] -f ... is uniformly convergent for |z — 2tq| < p for each p < r. Then the coefficients in any column form a convergent series. Moreover, if ^=i>: 00 n=0 is the sum of the kth column coefficients, then ]>^£L0 Ak(z — z0)k is the Taylor series for F(z) about 2 = 20, and it converges for \z\ < r. Proof. By the given convergence, F(z) is analytic for \z — zo\ < r and hence is given by a convergent Taylor series about z0. Its kth Taylor coefficient is n=0 n=0 by the complex variables fact after (6.1), and the lemma follows.
3. WEIERSTRASS p FUNCTION 159 Proof of Theorem 6.12. We are going to apply Lemma 6.13 to i.,c A N v ' ' in a small disc so that about z 1 (z-i p(*)- = 0. u)*z 1 z* weA Here 1 „ * -El u;€A u;4 w*+2 Thus if we define Gm by (6.4) for m > 3, the lemma says that the series for Gm is convergent (which we know already) and p(*)-4 = f>+l)G*+2z*. Z* We see directly from the definition (6.4) that Gm = 0 for m odd. Therefore p(z) = L + 3G4*2 + 5G6*4 + 7G8*6 + . • •. By direct calculation we find p'(z) = ~ + GG4z + 20G6*3 + 42G8*5 + 0(z7) p'(z)2 = A - 24G44r - 80G6 + (36G2 - 168)*2 + 0(z4) p{zf = 1. + 6G4 + 10G6z2 + 0(z4) p(*)3 = ~ + 9G4^ + 15G6 + (21G8 + 27G2>2 + 0(z4) p'{zf - 4p(z)3 + 60G4p(z) + 140G6 = 0(z2). The left side of the last equation is therefore without poles or constant term. By Proposition 6.1, it is 0. Defining <}i and g3 by (6.5), we obtain the theorem.
160 VI: COMPLEX POINTS Theorem 6.14. The map of C/A into P2(C) given by { (6.6) (0,1,0) forzGA v ; is holomorphic and carries C/A one-one onto the projective plane curve E(C), where E is given in affine form by y2 = 4x3 - g2x - g3. Remark. Recall from §11.1 that our various systems of affine local coordinates make P2(C) into a complex manifold. PROOF. For z ^ 0, the image is (p(z), p'(z)} 1), which becomes (p(z),p'(z)) in the affine local coordinate system given by 4> = J. Since each coordinate is analytic in z, our map (6.6) is holomorphic for z ^ 0. Near z = 0, we use the affine local coordinate system given /l 0 0\ by <£ = ( 0 0 II, and the map is given by 0 1 0, Mil i JL\ _ (£WL _M Vp'oo1 • pw Woovco; 0_ (0,1,0) — (0,0). Each coordinate is analytic in a punctured disc about 0 and is continuous at 0, hence is analytic in a disc about 0. Thus (6.6) is holomorphic. It is clear from Theorem 6.12 that the image is contained in E(C). Suppose z\ and z2 in IT map to the same point. Then p(*i) = p(z2) and p'(*i) = p'C2^) and zx ^ 0, z2 ^ 0. Since p has order 2, p(zl) = p(z2) with 2i ^ z2 implies z2 — z\*y in the notation of (6.3). But then pf(z2) = p'(*i*) = p'(period - zi) = p'(-zx) = -p'(*i). Since also pf(z2) = p'(zi), we see that p'^i) = p'(z2) = 0. By Lemma 6.11, *! and z2 are members of {\^\, ^w2t |(u>i + u>2)}. Consequently zi* = zi, ^2* = z2. Since Z2 = 2:1*, we obtain z\ — z2i contradiction. We conclude that the map (6.6) is one-one. Finally we show that the map (6.6) is onto. Let (x,y,l) £ E(C) be given. Since p has order 2, we can find z (clearly ^ 0) with p(z) = x. Then also p(z*) = x. The equation y2 = 4x3 - g2x -g3 = 4p(z)3 - g2p(z) - flfe = p'(*)2 says either that y = p'^), in which case z maps to our point (x,y, 1), or else y = -p'(z), in which case z maps to (x,—y,l). In the latter case, p'(z*) = —p'(z) implies z* maps to (x,y, 1).
3. WEIERSTRASS p FUNCTION 161 Theorem 6.15. The cubic polynomial 4w3 — g2w — g3 factors as 4w3 — g2w — g3 = A(w — u\)(w — u2)(w — u3)} (6.7) where ui = p(^Ui)} u2 = lj(^(lj2)} u3 = u(j(u;i +v2)). Consequently the plane curve E : y2 = 4x3-g2x-g3 (6.8) is nonsingular. Proof. For the first statement we go over the proof of Theorem 6.10 for the even elliptic function pf(z)2. Since the only pole in II is at z — 0 and since the only zeros of p'{z) are simple zeros at Ux,u2iu3, the proof shows that u/(*)2=cn(P(z)-«o- t=i Subtracting the expression in Theorem 6.12, we find that every value p(z) of the p function satisfies 3 4p3 - g?P -9s = C]J(p - u{). izzl This identity is impossible unless C — 4 (since nontrivial cubics have only 3 roots). Thus the cubics match. This proves the first statement in the theorem. The second statement follows from Lemma 6.11c and Proposition 3.5. The nonsingularity of E in Theorem 6.15 allows us to define a complex manifold structure on E(C) in a natural way. About (x,y, 1) we define E(x,y) = y2 - (4x3 - g2x - g3). We have 9E ,,„ 2 dE „ ^ = -(12*2-«,,), ^ = 2y. When y^O, the Implicit Function Theorem allows us to define y = f(x) implicitly so that the curve is (x,/(x),l). A chart is given by (x,/(x),l)-*. BE When -£— ^ 0, the Implicit Function Theorem allows us to define ox x = g(y) implicitly so that the curve is (g(y)iyi 1). A chart is given by (0(2/)>y,i)-*y-
162 VI: COMPLEX POINTS Finally we have to consider behavior about (0,1,0). About (x}\yw) we define e(x, w) = w — (4x3 — g2W2x — ^u;3). de I Then — ^ 0, so that w = h(x) implicitly and the curve is ^1(0,1,0) (x, 1, h(x)). A chart is given by (x, l,iu) —► x. We readily check that these charts make E(C) into a complex manifold such that inclusion into /^(C) is holomorphic and such that any holomorphic mapping into P2(C) with image in E(C) is holomorphic into E(C). Relative to this complex structure on E(C)} we have the following corollary. Corollary 6.16. The inverse of the one-one map C/A —► E(C) given in (6.6) is holomorphic. Proof. About points (x, y, 1) where y ^ 0, the coordinate function is (x,y, 1) —► x. Thus the coordinate version of the map (6.6) is z —► p(z) about points zq where p'(z0) / 0. The invertibility condition for this function (by the one-dimensional Inverse Function Theorem over C) is that p'(z0) ^ 0. So we have invertibility. About points (x, y> 1) where y = 0 (i.e., p'(zq) = 0), we are considering z — \u\, |cJ2, \{y>\ + ^2)- The coordinate function is (x,y,l) —► y. Thus the coordinate version of the map (6.6) is z —► p'{z) about z = ^lj\, ^U2j \{u\ +^2). The invertibility condition is that p"{zo) ^ 0. But p"(zo) = 0 would mean p' has a double 0 at zq} in contradiction to Lemma 6.11b. About (0,1,0), we are considering z = 0. The coordinate function is (x, l,ti>) —♦ x. Thus the coordinate version of the map (6.6) is z —► p(z)/p'(z). This has a simple zero at z = 0, and thus its derivative is nonzero at z = 0. This completes the proof. 4. Effect on Addition We continue to assume that u)\ and lj2 in C are linearly independent over R, and we continue with the notation A, II, and E as in §3. We write <p for the biholomorphic map <p : C/A —► E(C) given by (6.6). Theorem 6.17. The biholomorphic map ip : C/A —► E(C) is a group isomorphism.
4. EFFECT ON ADDITION 163 REMARKS. The proof will be preceded by Lemma 6.18. Had we not already proved that the operation (P11P2) —*• Pi + Pi m E(C) *s associative, then Theorem 6.17 would give us a proof. This was Poincare's approach, and as a consequence he obtained associativity for the k rational points of elliptic curves defined over Jk, provided the field k is a subfield of C. Lemma 6.18. If / : C/A x C/A —► C/A is analytic in each variable and continuous jointly, then there exist a, 6, and c in C such that /(*i» Zti) — azi + bz2 + c mod A for all z\, z2 G C. Proof. We lift / to F : C x C —► C/A with F jointly continuous and separately analytic. Then we lift F to a function F : C x C —► C that is jointly continuous and separately analytic. The equality F{z\ -f rnui -f nc^, 2:2) = F(z\, Z2) mod A says that F(zi + mu>i 4- na;2,Z2) = F(z1,z2) + miwi -f nj^ with mi,ni in Z depending on zi,Z2,rrc,n. For fixed m and n, rrci and ni are continuous in (zi, 2:2)1 hence constant. Differentiation gives —(zi imui +nu2,Z2) - -q^-\zuz2) OF, x 5F, — (zj + mcji + nw2, 2:2) = ^-(zi»*2)- dF dF Hence —— and —— are elliptic in the first variable and thus constant dz\ dz2 in z\. Similarly they are constant in Z2 and thus globally constant. Say dF dF —— = a and —— = b. Then we have F(z\}Z2) = az\ + bz2 + c for a OZ\ OZ2 suitable c, and the lemma follows. Proof of Theorem 6.17. First we note that y>(0) is the identity element of E(C)} translation is analytic on E(C)t and addition is continuous on E(C). (For the last of these, it is enough to prove continuity at (0,0), where it is clear.) Form / : C/A x C/A —► C/A given by f{zuz2) = <p~l(<p(zi) + <p(z2)) niod A.
164 VI: COMPLEX POINTS This is analytic in each variable and is jointly continuous. By Lemma 6.18, f(zx, z2) = azx + bz2 + c mod A. Since /(0, 0) = 0, c = 0. Since /(z,0) = z, az = z mod A for all z and thus a = 1. Similarly 6=1. Thus f{zuz2) = z\ + z2, i.e., zi + 2r2 = ^"U^i) + V>(*2)) mod A. Applying <p, we obtain ¥>(*1 +Z2) = </>(*l) + </>(*2), as required. An analytic map h : E(C) —► #(C) fixing 0 is called an isogeny. (More generally an analytic map h : #i(C) —► E2(C) carrying identity to identity is an isogeny.) We can reinterpret an isogeny h by means of (p : C/A —► E(C), i.e., we can study <p~x oho (p. In Lemma 6.18, we put f{z\yz2) = (p~l o h o <p(zi). The lemma gives <p~~l oho <p(z) = az -f c. Since /i(0) = 0, c = 0. Thus our reinterpretation is h(<p{z)) = <p(az). The trivial maps of this sort have a £ Z. If a is in R but not Z, take z = w\ to see that the map is not well defined. Thus the only nontrivial isogenics from E to itself have a £ C with a not real. In this case we say that E has complex multiplication. Proposition 6.19. E over C as above has complex multiplication if and only if u2/w\ lies in a quadratic extension of Q. Proof. With a as above, h cannot be well defined unless a A C A. Conversely if a A C A, let z\ = z2 mod A, say z\ — z2 -f u>. Then azi = az2 + aw G az2 -f A shows that z —► az is well defined on C/A, i.e., we can define an isogeny h. Now suppose aA C A. Then awi = mu>i -f nu;2 with m,n £ Z. If t = lj2/{jJ\ , this says a = m -f nr. Now au>2 = (m + nr)u>2 equals m'u;i + n'u^- Dividing by u\ gives (m -f nr)r = m' -f n'r. Thus r satisfies a quadratic equation with integer coefficients. Conversely if rr2 + sr + t = 0, then (rr)r = -< - 5r. Define a = rr g R. Then au>\ = ra;2 and act^ = — twi — su2, so that a A C A.
5. OVERVIEW OF INVERSION PROBLEM 165 5. Overview of Inversion Problem In §§1-4 we have seen how to associate to a lattice A = Zu>i + Zu;2 in C a nonsingular curve E : y2 = 4x3 - g2x - g3 with #2 and </3 in C. This association is implemented by the biholomor- phic mapping (p : C/A —► E(C) given by ^V ; 1(0,1,0) for ^ G A. V ; In this section we address a converse problem. We know that every elliptic curve over C, after the linear change of variables that changes (3.23) into (3.26), is of the form y2 = 4(x - a)(x - b)(x - c) (6.10) with a,6,c distinct in C. With another linear change of variables, we can even make a-f 6-f c = 0, and in this case the x2 term drops out from the right side of (6.10). Our question is: For such a curve E, does there exist a lattice A such that the above association carries A to E? The answer is "yes." The positive answer to this inversion problem is a fairly easy consequence of the Uniformization Theorem in the theory of one complex variable. However, we prefer to take a more pedestrian approach, since it gives more information. Essentially we shall produce concrete formulas for the inverse of (p in (6.9). In terms of parameters, we need to use E to construct a pair of generating periods w\ and l^. Then we can form the corresponding p function, and we shall recover a}byc as the parameters wi,«2,^3 attached to this p function. The construction of the parameters will be a byproduct of the construction of <p~~l. To see the nature of <p~l, let us assume that E does indeed arise from w = p(z). Then the results of §3 give f-jjM =4(w-a)(w-b)(w-c) dw —. = = dz 2y/(w - a)(w - b)(w - c) z{w)=ro n jw w i- (6n) J 2\/(w — a)(w — b)(w — c)
166 VI: COMPLEX POINTS This integral is called an elliptic integral of the first kind. In order to assert that w = p(z) is genuinely the inverse function of an elliptic integral of the first kind, we need to make sense out of (6.11). The first step is to make sense out of the double-valued integrand; we carry out this step in §7 after an essay on analytic continuation in §6. Then in §8 we clarify what path of integration to use in order to make sense of (6.11). Different choices of path will lead to different values of the integrand, but the different values will all lie in a translate of a lattice, namely the lattice we take as A. Once we have made sense of (6.11), we can show it has a locally defined inverse, and we can show that the locally defined inverse extends to be globally meromorphic. The rest will be easy. In §9 we shall illustrate how to make rapidly convergent numerical calculations for passing back and forth between A and (#2,03), particularly when 02 and g^ are in R. 6. Analytic Continuation A function element is an analytic function on a disc. If the disc is centered at z = a, we speak of a function element at a. We shall define a notion of "direct analytic continuation" of a function element; this will be another function element of a certain kind. Iteration of this notion will lead to the definition of "analytic continuation" of a function element along a path. To define direct analytic continuation, let the first function element be f{z) ~ a0 + ai(z - a) + a2(z - a)2 + ... for \z - a\ < r(a). (6.12) Suppose s satisfies \s — a\ < r(a)} so that the disc \z-s\ <r(a)-\s-a\ (6.13) lies completely within the disc \z — a\ < r(a). Then we can compute the derivatives of f(z) at s to obtain f(z) = b0(s) + &i(s)(z - s) + b2(s){z - s)2 + .... (6.14) For fixed s satisfying \s - a\ < r(a), Taylor's Theorem shows that (6.14) is convergent for z as in (6.13). But more is true. If we write (z-a)n = [(z-s) + (S-a)]n, (6.15)
6. ANALYTIC CONTINUATION 167 then we can compute the coefficients 60(5),61(5),... from (6.15); The result is that fro(s),&i(s), • •• are convergent power series in s — a. Effectively the bj(s) are obtained by expanding (6.15), substituting into (6.12), and rearranging. We say that (6.14) arises from (6.12) as a result of rearranging the series at s. A second function element is a direct analytic continuation of the given function element if its center s satisfies |s — a\ < r(a) and if the series for the second function element arises from the given function element as a result of rearranging the series at 5. It may be that r(s) for the second function element is in fact larger than the obvious possibility r(a) — \s — a\ given in (6.13); in fact, this will be the case of interest for us. Fix a and b} and let C be a path from a to 6 given by 7(0» <* < t < 0 with 7(*) = a> 7(0) = 6- Let / be a function element at a. A function element g at b is said to be an analytic continuation of / along C if there exist (1) a partition a = t0 < *i < ••■ <tn = /?, (6.16) (2) discs Dj centered at 7(2j), 0 < j < n, and (3) function elements fj at j(tj), 0 < j < n, defined on Dj such that /o = /, fn = <7, 7[*j-i>*j] Q Dj-i for 1 < j' < ™> and fj is a direct analytic continuation of /;_i for 1 < j < n. Example. Let C be the unit circle 7 = e*', 0 < t < 2ir. Let /(z) = y/z in \z — 1| < 1, with the square root taken to be positive on the positive reals, and let g(z) = y/z in \z — 1| < 1, with the square root taken to be negative on the positive reals. It is easy to check that g is an analytic continuation of / along C. The discs Dj can be taken to be centered at the points e'**/4, 0 < j < 8. Proposition 6.20. Let C be a path from a to 6, and let / be a function element at a. If (71 and gi are function elements at 6 that are analytic continuations of/ along C, then g\ and gi have the same Taylor series expansions about z — b. (Only the radii of their discs of definition can be different.) Proof if D0 D 7[<*,/?]. In terms of the partition (6.16) used to determine one of g\ and g2, we claim that the Taylor series for fn is the result of rearranging the series for /o at y{tn)- If so, then certainly
168 VI: COMPLEX POINTS g\ and g2 wiH have the same Taylor series expansions. We proceed by induction on n, the case n = 1 being true by definition of direct analytic continuation. Assuming the result for n — 1, we then know that /n_i is the result of rearranging the series for /o at 7(<n_i), so that fo(z) = fn-\(z) for z £ Do C\ Dn_i. By definition of direct analytic continuation, fn-\{z) = fn{z) for z G A1-1 H Dn. Thus fo(z) = fn(z) for z E D0 nDn_i C\Dn. This intersection is nonempty since j{in) is in it. Thus fo{z) = fn(z) on D0 fl Dn, and the assertion follows. Proof in general case. The main step is to see that the partition (6.16) can be suitably refined without adjusting the series of the initial or final function element. Recall that t[<j-i,<j] C Dj-\. Then 6 = min distance(7[<,_i,<j], D^_i) i<i<n is > 0. Fix any positive number p with p < 6. By uniform continuity of 7 (with € = p)y we can refine (6.16) to a partition a = so < ••• < sm = 0 (6.17) so that 7[*i-i»*] £ {|* - 7(*»-i)l < P} n {\z - y(si)\ < p] for 1 < i < m. Define At = {\z - j{si)\ < p) for 0 < i < m. Let tj-utj (6.18) be two consecutive points of (6.16) and let 5jb_i = <j_1, 5jfc, . . . , 5/ = tj (6.19) be the corresponding points of (6.17). Now the series for /;-i converges in Dj_i, and all the discs Ajb_i, A*,..., A/ lie in -Dj-i, by definition of 6. Hence the special case shows that fj results whether we use the partition (6.18) and discs Dj_i,Dj or the partition (6.19) and discs Ajk_i, Ajfc,..., A/. Putting these results together for 1 < j < n, we see that fn is the result of analytic continuation from /o using the partition (6.17) and the discs Ao,..., Am. Now we can compare g\ and #2- Let us say that the above argument would work with 6 — 6\ for g\ and with 6 = 62 for g2. By choosing p < min{6i,62}, we can choose a common refinement of both partitions and see that #i and g2 are obtained from a common sequence of points and discs.
7. RIEMANN SURFACE OF THE INTEGRAND 169 Proposition 6.21. Suppose C is a path from a to 6 given by y(t)y Q 5* * < 0- Suppose / is a function element at a and g is a function element at b obtained by analytic continuation of / along C. Then there exists € > 0 with the following property: If C" is any path from a to 6 given by il>(t)} <*<*</?, that is uniformly within e of 7(f), then g is an analytic continuation of/ along C". This proposition has the following use: We can always approximate C uniformly by polygonal paths, and thus polygonal paths give us the most general analytic continuation. We omit the proof of Proposition 6.21, as it is in much the same spirit as the previous proof. Proposition 6.22. If g is an analytic continuation of/ along a path C\ then / is an analytic continuation of g along the reverse path C"1. PROOF. Without loss of generality, let g be a direct analytic continuation of /, so that the given partition is a = to < i\ = /?. We apply the construction in the second half of Proposition 6.20, obtaining a refinement a = sq < • • • < sm = /?, discs A, for 0 < i < m, and function elements (/,, At) such that 7[st_1,5t] C At_i flAj, /o has the same series as /, fm has the same series as g} and fc is a direct analytic continuation of /,_i. For the path C"1, we use the partition -/? = r0 < • • • < rm = -a with r,- = — 5m_,-, the discs D{ = Am_,-, and the function elements 9i = fm-i' Then 7-1[r.--l>n] = 7"1[-5m-i+l,-*m-i] = 7[*m-i.*rn-i+l] C Am-jDAm_t>i = DiHDi-i C Dt_L Also the fact that Am_t and Am_t+i have the same radius shows that fm-i and /m_,+i are direct analytic continuations of each other, hence that gi is a direct analytic continuation of <7,_i. 7. Riemann Surface of the Integrand Let a, 6, c be distinct points in C. In this section we treat the problem of defining y/(z - a)(z - b)(z — c). Fix z0 € C - {a, 6, c}, and fix q0 as one of the two possible values of \/(zq — a)(zo — b)(zo — c). Then y/(z — a)(z — b)(z - c) at least extends to be analytic in any disc about zq within C — {a,6,c}.
170 VI: COMPLEX POINTS We shall define a double covering H of C — {a, 6, c} so that >/(z - a)(z - fc)(z - c) extends to be well defined on all of 11. Then we shall complete 71 to a space IV over C U {oo} that fills in one preimage above each of a, 6, c, oo. The space IV will be a one-dimensional complex manifold (Riemann surface), and it will be compact. (Actually it will be a torus.) Consider loops in C— {a, 6, c] based at z$. If such a loop C is piecewise smooth, we define its total winding number about a, 6, and c to be n(C) = n{C} a) + n(C} b) + n(C, c) 2iri\Jcz-a Jcz-b Jcz-cJ If the loop C is not piecewise smooth, all piecewise smooth loops that are uniformly sufficiently close to C can be seen to have the same total winding number, and we take this to be n(C). Homotopic loops based at z0 have the same n(C). Then n(C) gives a group homomorphism from the fundamental group 7rj(C — {a,6,c}, zq) into Z. This homomorphism is onto Z since there exists a loop that winds once around a and 0 times around 6 and around c. Therefore the subgroup of even total winding number H = {Ce n(C - {a,6,c}, z0) I n(C) G 2Z} has index 2. Let 1Z be the covering space of C — {a,6,c} corresponding to //, let e : 72. —► C — {a, 6, c} be the covering map, and fix (o as a base point in H with e(Co) = ^o- For use in §8, we record how H is generated in terms of tt^C — {a,6,c}, zq). Let ro,r&,rc be simple piecewise smooth loops in C — {a, 6, c] based at z0 that intersect one another only at z0 and that have winding numbers n(rflla) = 1, n(ra,6) = n(ra,c) = 0 n(r*,6) = 1, n(r»,a) = n(r6,c) = 0 (6.20) n(re,c) = l, n(rc,a) = n(rc,6) = 0. Define Ti = rar6 and r2 = r6rc. (6.21) Proposition 6.23. The subgroup of ir{(C- {a,6,c}, z0) of even total winding number is generated by the classes of fi, r2) rj, T^, rj.
7. RIEMANN SURFACE OF THE INTEGRAND 171 PROOF. The loops ra,r6,rc generate 7Ti(C — {a, 6, c}, z0)y and each of the purported generators is in the subgroup of even winding number. Thus let a word in ra,ri,,rc and their inverses be given, and suppose it has even total winding number. Since n(-) gives +1 or —1 for each factor, there must be an even number of factors. Grouping them in pairs, let us show that each pair is generated in the required fashion. If the left hand member of a pair is T"1, we replace it by (r2)_1r; If the right hand member of a pair is r*"1, we replace it by r(r2)-1. In this way we may assume that the factors of the pair come from ra,r&,rc, not their inverses. We may assume the members of the pair are not the same, since the squares are in our list of generators. For the remaining six pairs, rar& and Tt,Tc are generators, and the other pairs are given by rbTa = rl(rarbrlrl rert = r2e(rtrc)-1rl r.re = (r.rt)(r?)-1(r»re) rera = r?(r»re)-,r?(rar»r1r2. This proves the proposition. The covering space becomes a Riemann surface in a natural way. Namely let D be a disc about p in C — {a, 6, c}. Since D is simply connected, it is evenly covered. Thus e-1(D) — Ai U A2, a disjoint union of open sets Ai and A 2 each homeomorphic with D via e. We can take e : £ —► z £ C as a local coordinate in A,-, and then (e, Aj) becomes a chart. We shall use the convention begun above that Greek letters refer to points and sets in H and the corresponding Latin letters refer to their counterparts in C — {a,6,c}. We have already fixed Qq with e(Co) = zo • Let Do be a disc in C — {a, 6, c] about zo, and let Ao be the component of e"1(Do) to which Co belongs. Recall that y/(z — a)(z - b)(z — c) makes sense on Do as an analytic function with y/(zo — cl)(zq — 6)(zo — c) = <jo- Proposition 6.24. There exists a unique analytic function F :7l —+ C satisfying F((q) = q$ and having the following property: If C is any path from Co to £i in 7Z} if D is a disc about z\ = e(£i) in C — {a, 6, c}, and if e_1 : D —► A is regarded as a map from D to the component A of e_1(D) to which Ci belongs, then (Foe"1, D) is an analytic continuation of (y/(z-a)(z-b){z-c)f D0) (6.22) along e(C).
172 VI: COMPLEX POINTS Remark. Temporarily we shall write /(C) = >/«-a)«-&)«-c) for the function produced by the proposition. We shall introduce better notation in (6.23). Proof. Uniqueness is an immediate consequence of Proposition 6.20 and the path-connectedness of 71. For existence, let fo{z) be the function element (6.22). It is clear that this /0 continues analytically along any curve C in C — {a,6,c} and that the result is one of the two values of the square root, because fo(z)2 extends to a global function and analytic continuation respects squaring. Given Ci in 1Z, let C be a path from Co to Ci and let f\ be a function element at z\, with domain D\, that is an analytic continuation of /o along e(C). Let Ai be the component of e"2(Di) to which Ci belongs, and define e"1 as a function from D\ to Ai. Then we can define an analytic function F on Ai by F = f\ o e"1. For F to give a globally defined analytic function on 7£, we need only see that F is well defined. Thus let C# be another path from Co to Ci, and let ff be the corresponding function element at z\ obtained by continuation along e(C*). We are to show that f\ and ff have the same Taylor series about z — z\. Analytic continuation of /o around the loop Ta leads to —/o, and similarly for T& and Tc. Thus analytic continuation of /o around a loop 7 in C —{a,fc,c} based at zQ leads to (-l)n^7^/o. Since CC#~l is a loop in H based at Co, e(CC#~l) leads to /0. Since analytic continuation of /o along e(C) leads to /i, analytic continuation of /i along e(C$~l) ~ e{C*)~l leads to /o. By Proposition 6.22, analytic continuation of /o along e(C#) leads to f\. By Proposition 6.20, /i and ff have the same Taylor series expansion. Thus F is well defined, and the proposition follows. Now we shall complete Tl to a space TV. The set TV is to consist of the points of TZ together with four more points a,/?,7,00. We define e :TZ* -+C\J {00} = PX(C) by e(a) = a, e(/?) = 6, e(7) = c, e(oo) = 00. To topologize TV', we declare that 7£ is to be open in 7£* and that basic open neighborhoods of a,/?,7,00 are to be defined as follows: A basic open neighborhood of a is given by {a} Ue"l(Dx), where Dx is any punctured disc centered at a and lying in C — {a,6,c}, and basic open neighborhoods of /?,7,oo are defined similarly. The resulting TV is easily seen to be a Hausdorff regular space with a countable base,
7. RIEMANN SURFACE OF THE INTEGRAND 173 therefore a separable metric space. It is sequentially compact, therefore compact. With this notation the function F(£) of Proposition 6.24 is better written as nO = >/K-«)«-0)«-7). (6-23) This is the notation we shall normally use for F. Let us make IV into a Riemann surface. We need to define charts about a,/?,7,oo. In the case of a, let Dx be a punctured disc about a lying in C — {a,6,c}. Then e~x(Dx) has two preimages above each point of Dx. The lift of a loop in Dx with winding number 1 about a is not a loop (since y/z — a is not well defined on Dx) and therefore e~l(Dx) is connected. It follows that e"1(Dx) is a covering space of Dx. Arguing as in Proposition 6.24, we can define y/C, — a uniquely on e"1(Dx) in such a way that y/C, — Q takes a prescribed value at some base point. We put \/C ~ a = 0 if C = a. Then (t/C — «, e~l(Dx) U {a}) is a continuous map into C. Since it maps small discs to open sets, it is open. If VCi ~~ a = VC2 — <*, then 21 — a = z^ — a and z\ = Z2- Hence Ci and C2 lie over the same point z. If z = a then £i = £2 = a. Otherwise the square root changes by a factor of —1 between the two points of the same fiber. Thus >/£i — a = v^2 — <x implies £i = £2, and the map is one-one. Consequently (1/C "~ qj e~l(Dx)\J {a}) is a chart. It is clear that this chart is compatible with the charts on 1Z. We treat /? and 7 in the same way, and we are left with handling the point 00 of IT. For this we let Dx = {z \ \z\ > R} for R > max{|a|, |6), |c|}, and we check that e"1(Dx) is connected. We define —-=. on Dx, fixing the value at some base point and defining the function vC to be 0 at e_1(oo). Then ( -j=,e~l(Dx U {00}) j provides a compatible chart about 00. The function F(Q'1 = * _„—== with F(Co)-1 = l/q0 is analytic on H and is continuous from IV into CU{oo}. By Riemann's removable singularity theorem, it is a meromorphic function on 11. Let lis see its behavior near a, /?, 7,00. Near a, the local coordinate is s = y/C — ot. To calculate with this relation, we cannot use the manifold variable £ but have to use the complex variable z. Thus we have s2 = z — a, z = s2 4- a, and (2 - &)(* - c) = (s2 + a - b)(s2 + a-c).
174 VI: COMPLEX POINTS Hence F(C) =S V(s* + a - b)(s* + a-c) (6'24) for one of the two determinations of the square root. Since the square root is analytic and nonvanishing near 5 = 0, F(£)~l has a simple pole at £ = a. Similarly F(C)"1 has a simple pole at £ = /? and at £ = 7. Near 00 the local coordinate is 5 = —^=. Then 2 = s~2 and vC (,-.K.-»K.-«)-^(l-:)(.-i)(i-f) = z3(l-as2)(l-&s2)(l-cs2). Hence F(C)-1 = s3 . 1 = (6.25) V ' y/(l - a«2)(l - 6*»)(1 - c«2) v ; for one of the two determinations of the square root. Since the square root is analytic and nonvanishing near s = 0, /(C)-1 has a triple zero at C = oo. 8. An Elliptic Integral We continue with the notation of §7. Thus F : IV —► C U {00} is as in (6.23) and later, and Co is the base point of IV with e(£o) = zo- Our first objective is to make sense of the expression w(C) = J $n<rldc (6.26) where C is any piecewise smooth path in TV. Roughly the idea is that (6.26) is to be f<c) \F{z)~ldz, but the difficulty is that F(z)~l is not necessarily single-valued on e(C). The point of the reference to IV is to allow us to make a choice between F(z)-1 or -F(z)~l for each point of the curve. To be precise, let C be given by C(t) for t £ /, so that e(C) is the path in C U {00} given by e(C(0). Then (6.26) is to be made precise by the definition w(C)= f ±F(C(t))-le(C)'(t)dt. (6.27)
8. AN ELLIPTIC INTEGRAL 175 We need to check that the integral (6.27) is convergent. The only difficulty is in neighborhoods of a,/?,7,00. Thus suppose C is confined to a basic neighborhood. Near a, the local parameter is s = >/£ — a, and the path C(t) determines a path s(t) in the coordinate patch by s{t) = y/C(t) - a. Then s{t)2 = e(C(t)) - a, e(C(t)) = s(t)2 + a, and e(C)'(t)dt = 2s(t)s'(t)dt. Using (6.24), we substitute into (6.27) and obtain s'(t)dt u(C) = I 6/ \/(s(02 -fa- 6)(s(*)2 + a-c) (6.28) for a suitably small interval J. The denominator is bounded away from 0 for t G /, and the integral is convergent. Similar remarks apply to /3 and 7. Near 00, the local parameter is s = —=, and we have s(t) = —, Then S(02 = l/e(C(t)), e(C(t)) = s(t)-2, e(C)'(t)dt = -2s(t)-3s'(t)dt. Using (6.25), we substitute into (6.27) and obtain »(C)=[ -S'Wdt (6.29) w[ for a suitably small interval J. Again the denominator is bounded away from 0 for t £ /, and the integral is convergent. Thus (6.27) is always convergent, and we can use it as a definition of (6.26). Lemma 6.25. If C is a piecewise smooth path in It* lying completely in a basic disc with local parameter s and extending between local parameter values s\ and s^, then w(C) = £3±F(«s))-lfsds. (6.30) In particular, the integral depends only on the endpoints, not on the path itself. PROOF. For a disc in H with local parameter z, the lemma follows by applying the Cauchy Integral Theorem to a branch of
176 VI: COMPLEX POINTS l/y/(z — a)(z — b)(z — c). For a disc centered at a, the integral is given by (6.28), which is the value of the complex line integral / * (6.31) v/(52 + a - 6)($2 + a-c) over the local version s(t) of the path C(t). Since the integrand of (6.31) is analytic in a disc containing the image of the path s(t)y the line integral depends only on the endpoints and is given as in (6.30), by the Cauchy Integral Theorem. Similar remarks apply to (3 and 7. Near 00, the integral is given by (6.29), which is the value of the complex line integral -ds (6.32) / y/(l - as2)(l - bs2)(l - cs2)' and the same considerations apply. Lemma 6.26. If C and C" are piecewise smooth paths from £i to £2 in H* that are homotopic in IV, then w(C) = w(C). Proof. Let the homotopy be Ct, 0 < i < 1, where each Ct has domain interval / and each Ct extends from £i to £2- Without loss of generality we may assume that each Ct is piecewise smooth. It is enough to prove, for each *o> that w(Ct) is constant for / in a neighborhood of to. Changing notation, we see that it is enough to prove that any piecewise smooth path Ci that is sufficiently close to C\ uniformly for t € / has w{C2) = w(d). About each point of Ci, choose a basic disc A and a proper subdisc A" centered at that point. The discs A" cover the image of Ci, and we extract a finite subcover A'/,..., AJJ. Form the corresponding larger discs Ai,..., A„ and intermediate discs A^,..., A'n with A'/ C A;/ C a; C A; C A; for 1 < i < n. By uniform continuity of C\) there is some 6 > 0 so that Ci(t) G AJ; implies C\(t') € AJ for \t'-t\<6. Let 10 < • • • < tm be a partition of the domain interval / of Ci of mesh < 6. For each j with 0 < j < m, we can choose i = i(j) so that d(tj) is in A'/(jy Then Ci([<j-i,<;]) C Afi{j_x) for 1 < i < m. Choose e > 0 so that dist(A[,A£) > e for 1 < i < n. Suppose C2 is any path from Ci to ^2 that is uniformly within c of C\. Then we have C2([<j-i,^]) C Aiy_i) for 1 < j < m. In particular, Ci(tj) and C2(tj) are in At-y_!)n A,-qj, which is connected. Let 5; be a piecewise smooth
8. AN ELLIPTIC INTEGRAL 177 path from C\(tj) to C2(tj) in A^y^ij fl A,-y), 1 < .; < m — 1. Let So be the constant path at C\(to) and let Sm be the constant path at Cm(to). By considering cancelling integrals, we have m w(C2) = ^^(5;_1C2|[,i_1,tj]5>). (6.33) i=i But 5j-iC2|[t-_llt-]S'<y is a path in A^-i) with the same endpoints as Ci\[tj_lttj]. Thus Lemma 6.25 says that (6.33) is m = Ew(Ci|[ii.liti]) = w(Ci). This proves the lemma. Recall from Proposition 6.23 that the classes of the loops ri,r2,r^, r£,r^ generate the subgroup of 7Ti(C —{a,6, c}, z0) of even total winding number. Lift these loops to loops fi, f2, ra, Tp} T7 in H based at Qo- Lemma 6.27. /r ^F(Q"ldQ = 0, and similarly for /? and y. Proof. In view of Lemma 6.26 applied just within 1Z} the computation reduces to an integral over a loop in a basic disc of TV about a. Then Lemma 6.25 applies and shows we get 0. Define Wl= / ^(C)-1^ and w2 = / ±F(C)-ldC, (6.34) and let A be the subset of C given by A = Zu;! + Zw2. (6.35) We give C/A the quotient topology, which is not yet known to be Haus- dorfF. Proposition 6.28. There is a well defined map w : TV —► C/A with the following property. Whenever £ is in IV and C is a piecewise smooth path from £o to C, then w(C) = ™{C) mod A. The map w is continuous and open.
178 VI: COMPLEX POINTS Proof. Fix C in TV, let C be a piecewise smooth path from Co to C, and define tu(C) = w(C) -f A as a member of C/A. To see that w(Q is well defined, let C" be another such path. We are to show that w(C') — w(C) is in A. Without loss of generality we may assume the following: If C £ {0,^,7,00}, neither C nor C" meets {a,/?,7,00}; if C G {a,/?,7,00}, C and C" meet {a,/?,7,00} only at their final points. We have w(C')-w(C) = w(C'C-l)} and C'C~l is a loop based at Co- By Lemma 6.26, we can perturb C'C~~l near C, if necessary, so that C'C~~l does not meet {a,/?,7,00}. Afterward, w(C'C~l) depends only on the homotopy class of C'C"1 in 72., according to Lemma 6.26. Since Proposition 6.23 identifies the homotopy classes in 7£, we see that w(CfC~x) is a Z combination of ti;(fO,ti;(f2),ti;(ra),ti;(r^),ti;(r7). But w(Ta) = w(Tp) = w(T7) = 0 by Lemma 6.27. Thus w(C) — w(C) is a Z combination of u^ = w(Ti) and U2 — ^(^2), hence is in A. Thus w is well defined. If we fix Ci and a path C\ leading to it, we can look at how w behaves in a basic disc about Ci by using paths that begin with C\ and continue completely within the basic disc. Let C correspond to C\C in this way, and let Ci ♦-► s\ and C <-► s in the local coordinate. Then Lemma 6.25 gives u/(C) = "(Ci) + w(C) = w(d) + J' $F(C(8))-l^d8. (6.36) This formula shows that w is continuous. Since the integral on the right is nonconstant analytic in 5, w is open. Corollary 6.29. The complex numbers u)\ and u/2 in (6.34) are linearly independent over R. Therefore A in (6.35) is a lattice, and C/A is a Riemann surface. Proof. If wx and u/2 are dependent, then A C Ru; for a suitable complex number u. The natural map /j, : C/A —► C/(Ruz) is continuous and open, with a Hausdorff image. Hence the composition flow : IV —► C/(Ruz), with w as in Proposition 6.28, is continuous and open. Since TV is compact and C/(Ruz) is Hausdorff, the image of fi o w is open, compact, and closed. Since C/(Ru/) is connected, \i o w is onto. But C/(Ru>) is noncompact, and we have a contradiction.
8. AN ELLIPTIC INTEGRAL 179 Theorem 6.30. The map w : IV —► C/A of Proposition 6.28 is one-one onto and is biholomorphic. Remarks. The map is well defined by Proposition 6.28, and C/A has a complex manifold structure as a result of Corollary 6.29. Thus it is meaningful to ask about holomorphicity of w (and also w"1, once it exists). Proof. The map w is holomorphic as a result of (6.36). In local coordinates, (6.36) shows that w has derivative ^(c(s))_1S- (637) At a point £ of 7£, where s = z is a local parameter, (6.37) is just 1 2N/(s-a)(s-6)(5-c)' which is nonvanishing. At £ = a, (6.37) is the integrand of (6.31) and again is nonzero. Similar remarks apply to C = P and C = 7- Finally at C = oo, (6.37) is the integrand of (6.32) and again is nonzero. Since the local derivative is nowhere 0, w is an immersion. An immersion between compact connected smooth manifolds of the same dimension is necessarily a covering map, and thus w : It* —► C/A is a covering map. If we can show w is one-one, then w~~l will exist, and w"1 will be holomorphic as a consequence of the one-dimensional complex Inverse Function Theorem. Thus we are to show that w is one-one. Suppose tu(Ci) = w(£2) in C/A. Let Ci and C2 be piecewise smooth paths from Co to £i and £2, respectively. By definition of w(Qy w{Ci) = w(C2) + u>, for some u = mw\ -f nc^ in A. Since r = r^TJ has w(T) = a;, the path C3 = TC2 goes from Co to £2 and has w{C$) = w{C2) + w. Therefore w{d) = w{C3). (6.38) Let us say that C\ and C3 have domain [0,1]. For 0 < t < 1, we define C1#(t) = w(C1\[0tt]) and C3#(t) = w(C3\[0>t]) as piecewise smooth paths in C. Define Ci## = woC1 and C3## = woC3 (6.39)
180 VI: COMPLEX POINTS as piece wise smooth paths in C/A. Then Ci#(J) mod A = ti;(Ci|[0(t]) m°d A = w(Ci(t)) mod A and a similar computation for C3 show that C\* and 63^ are the unique lifts based at 0 of Ci## and C3## from C/A to C. By (6.38), Ci#(l) = C^{\). Therefore C\$C^~~l is a loop in C, necessarily contractible. Hence Ci^^C^^'1 is a contractible loop in C/A. Since w has been proved to be a covering map, (6.39) shows that C\C^1 is a contractible loop in IV. Therefore C\ and C3 have the same endpoint in 7£*, and £x = £2. We conclude that tu is one-one, as required. Theorem 6.30 completes the construction of the lattice A and a function w(Q that should have something to do with inverting a Weierstrass p function. To make the correspondence complete, we modify w(() by translation in C/A by the element J^° ^F(Q~ld( of C/A. (Recall from (6.27) and Proposition 6.28 that this element is well defined.) Thus our new definition of w(£) will be w(()= [ ^(O^dC + A as a member of C/A. (6.40) Joo The new w : IV —► C/A is still biholomorphic, and we let w~l : C/A —ft* be its inverse function. Let /1 : C — C/A be the (holomorphic) quotient mapping, and recall that the extended version of e, no longer a covering map, is a function e:7T — CU{oo} and is meromorphic. Define P : C —► C U {00} as the meromorphic function P(z) = eow'lofi(z). (6.41)
8. AN ELLIPTIC INTEGRAL 181 Theorem 6.31. The function P(z) in (6.41) is given by P(z) = p(z)+i(a + 6+c), (6.42) where p is the Weierstrass p function for the lattice A in (6.35). PROOF. The function P is meromorphic, and it is doubly periodic since it factors through C/A. Hence it is elliptic. Its only poles are in A because P-\{oo}) = n~\w{e-\{oo}))) = /i"1 (t*({oo})) = /i"1(0 + A) by (6.40) = A. Now we examine the order of the pole of P(z) at z = 0. The maps /x and w"1 are locally invertible. Thus the order of the pole of P is the same as the order of the pole of e at oo in IV. The local parameter is s = -^ near oo in IV. So e(£) = z = 5~2. Thus the pole has order 2. Next let us see that P is an even function. For r in C, put £ = w~~1fi(z). Let C be a piecewise smooth path from oo to £ in IV. The Riemann surface IV has a holomorphic involution (namely interchange of the two sheets), and we let C" be the image curve extending from oo to a point (J. Then (C') = / ^(C)-^C mod A Joo (along C') __ p J p/f\-ijr since the integrands ""i.(*MC,' (° C in (6.27) match = —ty(C) mod A. Hence and C = u»-1(-t»(C) + A) P(z) = et»-V(0 = e(C) = e(C') = efuT'f-u/K) + A)) = e(«T'(-u;(«r V(*)) + A)) = e(w-l(-p(z) + A)) = t{w-\ii(~z) + A)) = P{-z), as asserted.
182 VI: COMPLEX POINTS Our classification of poles of P shows that P has order 2. Thus it has only two zeros, counting multiplicities. The proof of Theorem 6.10 gives the structure of even elliptic functions and shows that P(z) = kl(p(z)-k2). (6.43) Let us see that P satisfies the expected differential equation. Put ip = w~lfi(z) € IV and restrict attention to points ip € 1Z. Then p = e(ip) is a local parameter in C, and we have P(z) = ew-1fi(z) = e(iP)=p. Since w(rp) = p(z), (6.40) gives I HO'1 Joo * € / £F(CrldC + A. By (6.30), we have as local expression for a derivative dp 2 v'4(p-a)(p-6)(p-c) for a certain branch of the square root. Consequently ^ = v/4(p-a)(p-6)(p-c). Replacing p = P{z), we see that P satisfies P'2 = 4(P - a)(P - 6)(P - c). (6.44) Now we evaluate k\ and £2 in (6.43). Since P is even and has a double zero at z = 0, we can write P(z) = f^. + c0 + 0{z2) (as z - 0) z* P\zf = i§2 + 0(,"2) P(z)-a=^ + 0(l) (P(z) - «)(/>(*) - 6)(/>(*) - C)) = ^ + 0(z"4).
9. COMPUTABIUTY OF THE CORRESPONDENCE 183 4c2 o 4c3 o Thus (6.44) implies —=^ = —=^-, i.e., c_2 = 1. Thus fci = 1 in (6.43), and P(z) = p(z) + k. Substituting this formula into (6.44) gives p'2 = 4(p + k - a)(p + fc - »)(p + * - c) = 4p3 + 4(3fc — a — 6 — c)p2 -f lower order. Since also p/2 = 4p3-$r2p-03, we can subtract and conclude from the fact that p(z) assumes more than two values that 3fc — a — 6 — c = 0. This proves the theorem. Corollary 6.32. Every nonsingular curve y2 = 4x3 - g2x - g3 over C is of the form C/A for some lattice A and is parametrized bi- holomorphically by z —► (p(z), p'(z))t where p(z) is the Weierstrass p function relative to A. 9. Computability of the Correspondence The theory of §§1-8 produced a correspondence A <-► (<72> £3) of lattices A in C with pairs (02,£3) of complex numbers such that the right side of E : y2 = 4r3 - g2x - g3 has nonzero discriminant. The correspondence was implemented for each A by a biholomorphic mapping of C/A onto £(C), given in terms of the Weierstrass p function for A, with inverse given in terms of elliptic integrals. In this section we study the correspondence A <-► (<72,<73) a bit further. In particular, we shall see (at least in cases of interest) that there are rapidly convergent expressions for the correspondence that are suitable for (approximate) numerical computations. Let us recall the exact correspondence of parameters. If A is given, then we obtain E by using "*° ! (6-45) u>eA u)?0
184 VI: COMPLEX POINTS If £2 and </3 are given, then we let a, 6, c be the roots of the right side of Et and we obtain A as the lattice generated by where "i= / ^no-'dc and U2= f incr1^, a, /?, 7 are points in 7£* covering a, b, c, (6.46) F(0 = n/«-«)(C-0)(C-7), fi = lift to TV of loop Ti around a and 6, V2 = lift to 7£* of loop T2 around 6 and c. To understand A, we are not concerned about minus signs on ui and u;2, and we can thus replace (6.46) by integrals in C: dz Jr3 2y/(z-a)(z-b)(z-c) dz 2y/(z - a)(z - b)(z - e)' where the square root in each case is a single-valued branch in the z plane slit between the two points in question and slit also between the third point and 00. From (6.45) and (6.47), we can read off one feature of the correspondence. Proposition 6.33. Under the correspondence A <-► (<72><73), A is closed under complex conjugation if and only if g? and g3 are both in R. Proof. If A is closed under conjugation, then (6.45) shows immediately that ^2 and g$ are real. Conversely let gi and g$ be real. Then either a,6,c are real, or one of them (6, say) is real and the other two are complex conjugates of each other. Suppose a,b,c are real. Let us say that a < b < c. For Ti we can use a curve that extends on the real axis on one side of a cut from a to 6 and comes back to a on the real axis on the other side of the cut with the square root having changed by a minus sign. Then wi = / / dX (6.48) J a \/(x - a)(x - b)(x - c)
9. COMPUTABILITY OF THE CORRESPONDENCE 185 for one of the two determinations of the square root. The expression under the square root is positive for a < x < b, and therefore u>i is real. Similarly we can take cj2= / y dX (6.49) Jb V (x - a)(x - b)(x - c) for one of the two determinations of the square root. The expression under the square root is negative for 6 < x < c, and therefore lj2 is imaginary. Hence Q\ = u\ and u>2 = —u;2. Consequently A is closed under conjugation. Now suppose 6 is real and a = c. Whatever path we use for T\, we can use the complex conjugate path for T2, apart from orientation. With suitable choices of the branch of the square root in each case, we can arrange that the integrands in (6.47) are pointwise complex conjugates of each other. Then it follows that u>i and u>2 are complex conjugates of each other, except possibly for a sign. If indeed lji = l/2> then A is closed under conjugation. If lj\ = —u>2, then u>i = — u>2 and G)z = —u>i show that A is closed under conjugation. Our chief interest is in elliptic curves defined over Q. Thus it is not unreasonable for us to restrict attention now to lattices and curves meeting the conditions of Proposition 6.33. There are rapid methods for calculating the sums in (6.45) and integrals in (6.47), and we shall use them in examining the correspondence further. To calculate g? and y3 rapidly, let us assume that ui /u)^ is in the lower half plane. (If it is not, then we have only to interchange u\ and u>2-) Then we shall see in from (8.10) that (6.50) The two series in (6.50) are rapidly convergent: The numerator is a polynomial in n, but the denominator grows exponentially with n. A general theory for handling (6.47) is available, but we shall concentrate on the case that a,6,c are real. Thus we want to investigate (6.48) and (6.49). The tool is the arithmetic-geometric mean of Gauss defined as follows: If a and b are positive reals with a > 6, we recursively let
186 VI: COMPLEX POINTS Then we can see that ao > <M > a2 > • • • > 62 > b{ > 60 (6.51) and liman = limtn, the convergence being quite rapid. The common value of lim a„ and lim 6„ is the arithmetic-geometric mean of a and 6 and is denoted M(a,6). [To see the existence of M(a, 6), we note first that (6.51) is clear. Then (<H - bi) - 2(ai+1 - 6t+1) = (a{ - 6,-) - (a; + 6, - 2\/aJn) = -26t + 2\/aibl > 0 shows that |a,+1 — 6j+1| < ||a; — 6,-|. The required convergence follows.] Proposition 6.34. The integrals in (6.48) and (6.49) with a < b < c are given in terms of the Gauss arithmetic-geometric mean by the following expressions, up to minus signs: dx L y/(z-a)(z-b)(z-c) M(y/^H,y/^b) dx in CJ2 ■L ^/{x — a)(x — b)(x - c) M{y/c - a, y/b - a) Proof. For cji we make the change of variables y/x — a = y/b — a sin 0 and obtain Jo ^/(c - 6) sin2 0 + (c-a) cos2 0 For LJ2 we make the change of variables y/x — b — y/c — b cos 0 and obtain w2 = 2i / Jo J(b - a) sin2 0 + (c - a) cos2 0 Thus the proposition will follow if we prove that 0 < r < s implies that /(r, s), when defined by I(r,s)= =_, (6.53) •A) Vr2 sin2 (9 + s2 cos2 6
9. COMPUTABILITY OF THE CORRESPONDENCE 187 evaluates as We shall prove that J(r,s) = /(W,^). (6.55) Iterating (6.55) and using the existence of M(s,r) and an easy passage to the limit, we see that /(r, s) = I(M{s, r), M(s} r)). (6.56a) But we can evaluate /(M, M) trivially, and the result is I{M'M) = ^!' (656b) The equality (6.56) implies (6.54). Thus it is enough to prove (6.55). To prove (6.55), regard 0 < r < $ as fixed. For 0 < t < 1, the function 2st r r-r increases from 0 to 1. Therefore (s + r) + ($ - r)t2 . * 2ssin<p . ^ ^ 7r sinfl = Y 2 , 0<y><-, (s + r) + (s — r) sin <p I is a legitimate change of variables, and 0 extends from 0 to \. We make this change of variables in (6.53) after rewriting (6.53) as rT/2 cos0d0 7(r,s) = f Jo (6.57) cos2 0Vr2 tan'2 0 + 52 * We readily compute 2« cos ^> [(s + r) — (s — r) sin2 y>] cfy? cos 0 d0 = - - . . 9—— [(s + r) + (s-r) sin2 y?]2 CQC2 g = cos2 ?»[(* + r)2 ~ (* ~ r? sin2 <p] [(s -f r) + (s — r) sin2 <p]2 « 4s2 sin2 u? tan20 = ^2—• cos2^>[(s -f r)2 — (s — r)2sin <p] Substituting these expressions into (6.57) and simplifying, we obtain r*'2 2 dip Jo yJArs sin2 (p + (s + r)2 cos2 <p and (6.55) follows. This proves the proposition.
188 VI: COMPLEX POINTS EXAMPLE. y(y + 1) = (x + l)x(x - 1). This elliptic curve is listed in Table 3.2 with A = 37. Using the transformation rule that passes from (3.23) to (3.26), we are led to This curve has Its roots are E : y2 = 4x3 - 4x + 1. 92=:4 g3 = -1. (6.58) a = -1.10716- • 6 = .269594 • • • c = .837565 The relevant Gauss arithmetic-geometric means are M(y/^Hy V^b) = M( 1.39453 • • • , .753639 • • •) = 1.04949 • • • M{y/F^lL,y/b^d) = M(1.39453- •• ,1.17335- -) = 1.28156- •• Thus (6.52) gives u>i = — /-^=^- = 2.99346 • • • M(y/c — a, vc — 6) u7 = -——J^—-==- = 2.45139 • • • i. M(y/c — a, y/b — a) With the aid of (6.50), we can use u\ and U2 to estimate g^ and g$ as g2 = 4.00000 • • • 03 = -.999999- •• in agreement with (6.58).
CHAPTER VII DIRICHLET'S THEOREM 1. Motivation Dirichlet's Theorem on primes in arithmetic progressions has the following statement. Theorem 7.1. If m and 6 are relatively prime integers with m > 0, then there exist infinitely many primes of the form km -f b with k a positive integer. This theorem was used in the proof of Theorems 5.2 and 5.3. Its proof contains the first use of L functions historically and will help as an introduction to the L functions that are the theme of the rest of this book. It is illuminating to begin with some special cases of Theorem 7.1 as motivation. We first suppose that m = 1. In this case the assertion of the theorem is just that there are infinitely many primes, and there are, of course, many proofs of this result. For our purposes the interesting proof is a relatively complicated argument given by Euler in 1737. We introduce the function n=l for real s > 1. (This function subsequently was named the Riemann zeta function and is defined and analytic for complex s with Re s > 1. We postpone a more serious discussion of £(s) to §2.) By monotone convergence, for example, we have lim<(*)=+oo. (7.2) We can write cw= n —n- <7-3) p prime 1 189
190 VII: DIRICHLET'S THEOREM for $ > 1. (In fact, the product for p < TV is n rrW=n (>+?+£+--)=x:';? with 52' taken over all n with all prime factors < N. Letting N —► oo, we obtain (7.3).) If there were only finitely many primes, then (7.3) would yield limsupC(s) < oo, in contradiction to (7.2). This contradiction completes Euler's proof that there are infinitely many primes. To consider further special cases, it is more convenient to work with iog<(*)= £ \o6'= £ (1 + ^- + ^ + ...), p prime 1 p prime x ' p> (7.4a) which we shall see in the course of proving (7.30) is = E i+»w (7-4b> p prime with g(s) bounded as s j 1. Euler's proof amounts to combining (7.2) and (7.4) to see that ^ j diverges, and the corollary is that there are infinitely many primes. To handle m = 4 in Theorem 7.1, we want to treat primes 4fc + 1 separately from primes \k + 3. Dirichlet's idea for this special case amounts to working with the sum and difference of E £ «<» E £. M p=lmod4 p=3mod4 ^ rather than the two terms separately. Tracing backwards in (7.4), we are led to consider the following expressions, which differ from the sum and difference of (7.5) by bounded terms as s J 1: (7.6a) . p prime, r I 1 p prime, r I (i.OD) \p=4fc+l / \p=4*+3 '
1. MOTIVATION 191 , . ft) ifn = 0 *°(")=(l if„=i Let us write = 0 mod 2 mod 2 {0 ifn = 0 mod 2 (7-7) 1 if n = 1 mod 4 -1 if n = 3 mod 4. With x equal to Xo or Xi> we have x(mn) = x(m)*(n) ^or a^ m anc^ n- Consequently the expressions (7.6) are both of the form *.x>-±*£- n -^ ™ n = l p prime 1 —— p> with x taken as xo for (7.6a) and as xi f°r (7.6b). We can write the analog of (7.4) uniformly as p prime with g(syx) bounded as s [ 1. Therefore log(L(s,xo)^(5,xi)) = 2 Yl — + (*(*. Xo)+0(*,Xi)) P5 p prime p=4Jfc + l log(X(5,xo)^,xi)"1) = 2 ]JP — + (g(8,xo)-9(s>Xi)). (7.9) p3 p prime p=4fc+3 For x = Xoj comparison of (7.3) and (7.8) shows that <{s) = L(siXo)l_2„s- Therefore limL(s,Xo) = +oo. Meanwhile the series in (7.6b) is alternating and converges for s > 0 by the Leibniz test. The convergence is uniform on compact sets, and the
192 VII: DIRICHLETS THEOREM sum L(s,xi) is continuous for s > 0. Grouping the terms of this series in pairs, we see that L(\,Xi) > 0. (Actually we can even recognize the value as w/4 from the Taylor series of arctan x.) Putting these facts about L(s, xo) and L(5,Xi) together, we see that both left sides of (7.9) tend to -foo as s [ 1. It follows from (7.9) that V A and Y i p prime p prime p = 4Jk+l p=4*+3 are both infinite. Hence there are infinitely many primes 4k -f 1, and there are infinitely many primes 4k -f 3. The proof of Dirichlet's Theorem for km -f 6 will proceed in similar fashion. We return to it in §4 after a brief but systematic investigation of the kinds of series and products that we have encountered in the present section. 2. Dirichlet Series and Euler Products A series Y^=\ ~ w^n a* anc* s complex is called a Dirichlet series. The first result shows that the region of convergence and the region of absolute convergence are each right half planes in C (unless equal to the (-l)n empty set or all of C). These half planes may not be the same: ^2 is convergent for Re s > 0 and absolutely convergent for Re s > 1. Proposition 7.2. Let Y^^L, — be a Dirichlet series. (a) If the series is convergent for s = s0i then it is convergent uniformly on compact sets for Re $ > Re so, and the sum of the series is analytic in this region. (b) If the series is absolutely convergent for s = so, then it is uniformly absolutely convergent for Re s > Re so- (c) If the series is convergent for s = so, then it is absolutely convergent for Re s > Re sQ + 1. (d) If the series is convergent at some so and sums to 0 in a right half plane, then all the coefficients are 0. Remark. The proof of (a) will use the summation by parts formula. If {un} and {vn} are sequences and if Un = ]C*=i w* f°r n > 0, then 1 < M < N implies N N-l X) U"Vn = X) U"(V" ~ V"+l) + UNVN ~ Um-\VM. (7.10)
2. DIRICHLET SERIES AND EULER PRODUCTS 193 On 1 Proof. For (a), we write — = -^- • —-— = unvn and then ap- n9 ns° ns-'° ply (7.10). The given convergence means that {Un} is convergent, and certainly vn —► 0 uniformly on any proper half plane of Re s > Re sq. Thus the second and third terms on the right side of (7.10) tend to 0 with the required uniformity as M and N tend to oo. For the first term, the sequence {Un} is bounded, and we shall show that ^|l>n-Vn+l| = ]Tl n = l n = l 1 1 ns-s0 (n+l)*-*o is convergent uniformly on compact sets where Re s > Re So- Use of (7.10) and the Cauchy criterion will complete the proof of convergence. For n < t < n + 1, we have |n-('—)_*-<'-••>!< sup n<Kn + l sup n<t<n + l —-(n-(5-5°) -rf-(5-5°)) dt so t*-a0+l |g-s0| - nl+Re(«-50)" (7.11) Thus K - vn+l | = |„-(--) - (n + l)-(^o)| < Jf£^L +Re(5-j0)' and the sum is uniformly convergent on compact sets with Re $ > Re Sq, by the Weierstrass M-test. Hence the given Dirichlet series is uniformly convergent on compact sets where Re s > Re Sq. Since each term is analytic in this region, the sum is analytic. For (b), we have I On In5 I —I In50 I 1 In50 I Since the sum of the right side is convergent, the desired uniform convergence follows from the Weierstrass Af-test. For (c), let e > 0 be given. Then I an 1 1 nJ0+l + < \n*° I n3+f with the first factor on the right bounded and the second factor contributing to a finite sum. Therefore we have absolute convergence at $o + 1 -f €, and (c) follows from (b).
194 VII: DIRICHLET'S THEOREM For (d), we may assume by (c) that there is absolute convergence at sq. Suppose ai = ••• = a//-i = 0. By (b), YIuLn ~7 — ^ ^or Re s > Re s0. The series (7.12) is by assumption absolutely convergent at sq> and Re s > Re sq implies an \(n/Ny < dn {n/NY° By dominated convergence we can take the limit of (7.12) term by term as s —► -f oo. The only term that survives is ayy. Since (7.12) has sum 0 for all st we conclude a^ = 0. This completes the proof. Proposition 7.3. The Riemann zeta function C(s) = J2™=i ^ ml~ tially defined and analytic for Re s > 1, extends to be meromorphic for Re s > 0. Its only pole is at s = 1, and the pole is simple. Proof. For Re s > 1, we have —i— = / r*dt = y^ t-°dt. Thus Re s > 1 implies i oo rn+1 = _ + £/ (n->-r>)dt. It is enough to show that the series on the right side converges uniformly on compact sets for Re s > 0. Thus suppose Re s > a > 0 and |s| < C. By (7.11) with $0 = 0, we have I fn+1 I fn+l Ul C \Jn I " Jn "1+Re5 - "1+<r Since ]T n~(1+<7) < oo, the desired uniform convergence follows from the Weierstrass M-test. Remark. Actually £(s) extends to be meromorphic in C with no additional poles. We return to this point in §5.
2. DIRICHLET SERIES AND EULER PRODUCTS 195 We shall now examine special features of Dirichlet series that allow the series to have product expansions like the one (7.3) for C(s). We begin with some general facts about infinite products. An infinite product flnLi an w^n an £ C and with no factor 0 is said to converge if the sequence of partial products converges and the limit is not 0. A necessary condition for convergence is that an —> 1. Proposition 7.4. If \an\ < 1 for all n, then the following conditions are equivalent: (a) n^iC1^ Kl) converges (b) £^=1 Kl converges (c) n^LiC1 — l«r»l) converges. In this case, fl^riO + an) converges. PROOF. Condition (c) is equivalent with (c') FI^LiO - \an\)~l converges. For each of (a), (b), and (c'), convergence is equivalent with boundedness above. Since i + Ew<no+w>snT4r n=l n=l n=l |Un I' we see that (c') implies (a) and that (a) implies (b). For (b) => (c'), we may assume, without loss of generality, that \an\ < ^ for all n. Since 0 < x < 4 implies log- < |x| sup llogrh 1 sup (W)=21 = |z| sup *l. we have Thus (b) implies (c'). Now suppose (a) holds. To prove that f^^L1(l + an) converges, it is enough to show that I"In=Af (* "*"fln) ^en<^s to 1 as M and TV tend to oo. In the expression I N I \n-M
196 VII: DIRICHLETS THEOREM we expand out the product, move the absolute values in for each term, and reassemble the product. The result is the inequality N Y[(l + an)-l n=M N < n^+M)-1- n=M By (a) the right side tends to 0 as M and N tend to oo. Therefore so does the left side. This proves the proposition. Consider a formal product Y[ (1 + app-* + • • • + apmp-™ + ...). (7.13) p prime If this product is expanded without regard to convergence, the result is the Dirichlet series Y\™ i —, where a\ = 1 and an is given by an = apr, • • • apr* if n = p\l • - -prkk. (7.14) Suppose that the Dirichlet series is in fact absolutely convergent in some right half plane. Then every rearrangement is absolutely convergent to the same sum, and the same conclusion is valid for subseries. If E is a finite set of primes and N(2?) is the set of positive integers requiring only members of E for their factorization, we have n(i+«pp-,+-+v»p-raj+--)= e £■• p£E n£N(E) Consequently the infinite product has a limit in the half plane of absolute convergence of the Dirichlet series, and the limiting product (7.13) equals the sum of the series. The sum of the series is 0 only if one of the factors on the left side is 0. In particular the sum of the series cannot be identically 0, by Proposition 7.2d. Thus (7.13) can equal only this one Dirichlet series. Conversely if an absolutely convergent Dirichlet series J^nLi ~ nas the property that its coefficients are multiplicative, i.e., a\ = 1 and amn = aman whenever GCD(m, n) = 1, (7.15) then we can form the product (7.13) and recover the given series by expanding (7.13) and using (7.14). In this case we say that the Dirichlet series has (7.13) as an Euler product. Many functions in elementary
2. DIRICHLET SERIES AND EULER PRODUCTS 197 number theory give rise to multiplicative sequences; an example is an = ip{n)y where <p is the Euler (p function. If the coefficients are strictly multiplicative, i.e., a\ = 1 and amn = aman for all m and n, (7.16) then the pth factor of (7.13) simplifies to 1 + app-> + ■■■ + (app->r + •■■= —^;- (7-17) P* In this case out Dirichlet series has a first degree Euler product: oo , Er n ^ (7-») 1 n n=l p prime i"-? P5 This is what happens with C(5)> where all the coefficients are 1, and with an = xo(n) and an = Xi(n) as in (7.7). Conversely an Euler product expansion of the form (7.18) forces the coefficients of the Dirichlet series to be strictly multiplicative. A Dirichlet series J^nLi ~ w*tn la*» I ^ n° ^or some real c 1S absolutely convergent for Re s > c+ 1. This fact leads us to a convergence criterion for first degree Euler products. Proposition 7.5. A first degree Euler product Y\ [1 — apP~s]~~l ^vith \ap\ < pc for some real c and all primes c defines an absolutely convergent Dirichlet series (and hence a valid identity (7.18)) for Re s > c-f- 1. PROOF. The coefficients an are strictly multiplicative, and thus \an\ < nc for all n. The absolute convergence follows. First degree Euler products are sufficient for Dirichlet's Theorem, but other Euler products are needed in other applications, such as with elliptic curves. To isolate the notion of degree of an Euler product, let us write (7.17) as a formal identity ^-r -r-,~ - -1-a.X pJ Here the denominator on the right is a polynomial of degree < 1 with constant term 1, and it is in this sense that the Euler product (7.18) has degree 1. The expansion (7.13) is called a kth degree Euler product
198 VII: DIRICHLET'S THEOREM if, for each prime p, there is a polynomial PP(X) G C[X] having degree < k and zero constant term such that ' l-Pp(X) as a formal identity. Let us factor I — PP(X) over C as l-Pp(X) = (l-r(»X)-..(l-rPx). We call the complex numbers rpJ' the reciprocal roots of 1 — PP(X). Proposition 7.6. A kth degree Euler product f] [1 - ^(P"')]"1 whose reciprocal roots satisfy \rp\ < pc for some real c and all primes p defines an absolutely convergent Dirichlet series for Re s > c + 1. For such s the sum of the Dirichlet series equals the Euler product. Proof. We apply Proposition 7.5 to \[ t1 ~ rjj'p"'] for each j. The product of absolutely convergent Dirichlet series can be rearranged without affecting the sum, and the result is an absolutely convergent Dirichlet series. The final thing we need to know about Euler products is how to recognize when they are of degree k. We shall be interested only in the quadratic case, but it merely obscures the idea to make such a restriction right away. Proposition 7.7. Let 1 + b,X + b2X2 + • - • + bmXm + . • ■ (7.19) be given. Then there is a unique polynomial Q(X) = c0 + ClX + • • • + c^X*-1 (7.20) of degree < k — 1 such that 1 1 - XQ{X) = 1 + XQ(X) + X2Q{Xy + • • • (7.21) is congruent to (7.19) modulo Xk+1. For this polynomial Q(X), the expressions (7.19) and (7.21) are equal if and only if bm = c06m_i + Ci6m_2 + h ck-ibm-k for all m > k. (7.22)
3. FOURIER ANALYSIS ON FINITE ABELIAN GROUPS 199 Proof. The condition that (7.19) and (7.21) be equal is the condition that 1 — XQ(X) be an inverse to (7.19) in the ring of formal power series over C. Consider the equation (1 - c0X - ClX2 cfc_!A*)(l + bYX + b2X2 + • • •) = (l + dlX + d2X2 + -"). The conditions d\ = • • • = d* = 0 uniquely determine Co,... ,cjk_i recursively. Once Co,..., cjt_i are fixed in this fashion, dm for m > k is given as the difference of the left side minus the right side in (7.22). The result follows. Corollary 7.8. An Euler product (7.13) with a\ = 1 is of degree 2 if and only if there exists for each prime p a complex number dp such that dprn =■ dpCLpm-l + dpdpm-l (7.23) for all m > 2. In this case the pth Euler factor is i-»r'-<MT»- <"« Proof. Let bm = apm in Proposition 7.7. The polynomial Q{X) works out to be cq + c\X with bi = cq and 62 = Cq + Ci. From (7.23) with m — 2, dp has to be apa — a£. So we define it this way to make (7.23) valid for m - 2. Now t/p = api - a2 = 62 - 6j = 62 - Cq = C! . Hence (7.23) for m > 2 is the same equation as (7.22) for m > 2. Thus the corollary follows from Proposition 7.7. 3. Fourier Analysis on Finite Abelian Groups In considering primes of the forms 4fc + 1 and ik + 3 in §1, we worked with the sum and difference of the expressions (7.5) and then reconstructed the individual expressions from the sum and difference. These steps correspond to doing Fourier analysis on the group Z£ = Z2. In this section we shall work out the Fourier analysis of Z£ that is appropriate for handling primes of the form km 4- b.
200 VII: DIRICHLET'S THEOREM Let G be a finite abelian group (such as Z£). A character of G is a homomorphism of G into 51 C C*. The characters of G form a finite abelian group G under pointwise multiplication: (xx')(9) = x(g)x'(g). Proposition 7.9. Let G be a finite abelian group. With respect to the Hermitian inner product (F, Ff) = ]£yG<? F(g)F'{g) on the vector space of all complex-valued functions on Gy the members of G form an orthogonal basis, each \ G G satisfying ||x||2 = \G\. Consequently \G\ = |G|, and any function F : G —► C is given by the "sum of its Fourier series": F^) = iir E (E F(ft)H^) J X(ff). (7-25) |G| x€g W / TERMINOLOGY. Equation (7.25) is called the Fourier inversion formula. Proof. First we prove that distinct characters \ and x' are orthogonal. If x and x' are given with x ^ x'> ^ x" = XXf- Choose gQ 6 G with x"(9o) ^ 1. From \9eG } g£G g£G we see that U-x"(<7o)]Ex"(0) = ° and hence ^X"(j) = 0- g£G g£G Consequently <x,x,> = Ex(*)x,aif) = Ex,,(») = o, and the orthogonality is proved. It follows that the members of G are linearly independent and hence that \G\ < \G\. It is clear that IMI2 = £lxGOI2 = £i = |G|- geG geG
4. PROOF OF DIRICHLET'S THEOREM 201 To see that the members of G are a basis, we write G as a direct sum of cyclic groups. A summand Z^ of G has at least JV distinct characters, given by j mod N —► e2nijr/JV for 0 < r < N — 1, and these characters extend to G as 1 on the other summands of G. Taking products of such characters from different summands of G, we see that \G\ > \G\. Therefore G is an orthogonal basis of the space of all complex-valued functions on G. Formula (7.25) is the usual inner-product-space formula for expressing an element in terms of an orthogonal basis. 4. Proof of Dirichlet's Theorem The proof of Dirichlet's Theorem is a direct generalization of the argument given for m = 4 in §1. Fix an integer m > 1. A Dirichlet character modulo m is a function x •" Z —► Sl U {0} such that (i) x(j) = 0 if and only if GCD(j, m) > 1 (ii) x(j) depends only on the residue class ,; mod m (iii) when regarded as a function on the residue classes modulo m, x is a character of Z£. In particular, a Dirichlet character modulo m determines a character of Z£. Conversely each character of Z£ defines a unique Dirichlet character modulo m by lifting the character to {j £ Z | GCT>(j,m) = 1} and by defining a Dirichlet character to be 0 on the rest of Z. It will often be notation ally helpful to use the same symbol for the Dirichlet character and the character of Z£. Because of this correspondence, the number of Dirichlet characters modulo m is <p(m)y where <p is the Euler <p function. The principal Dirichlet character modulo m, denoted xo, is the one built from the trivial character of Z£: fl ifGCD(i,m) = l X°UJ \o ifGCD(i,m)>l. Each Dirichlet character modulo m is strictly multiplicative, in the sense of (7.16). We assemble each as the coefficients of a Dirichlet series, the Dirichlet L function, by ^(••X) = f;^. (7-26) n = l
202 VII: DIRICHLET'S THEOREM Proposition 7.10. (a) The Dirichlet series L(s}x) is absolutely convergent for Re 5 > 1 and is given in that region by a first degree Euler product ^>*)= n —^r (727) p prime 1 - *±*-l P° (b) If x is not principal, then the series for L(sy x) is convergent for Re 5 > 0, and the sum is analytic in that region. (c) For the principal Dirichlet character xo modulo ra, L(s,xo) extends to be meromorphic for Re s > 0. Its only pole is at $ — 1, and the pole is simple. It is given in terms of the Riemann zeta function by L(slXo) = «s)]l(l~±). (7.28) P\m V V J Proof. For (a), the boundedness of x implies that the series is absolutely convergent for Re s > 1. Since \ is strictly multiplicative, L(s,x) has a first degree Euler product, by (7.18), and the product is convergent in the same region. For (b), let us notice that X ^ Xo implies m ^2 x(n + b) = 0 for any 6, (7.29) n=l since the member of Z^ that corresponds to x is orthogonal to the trivial character, by Proposition 7.9. For s real and positive, let us write in the notation of (7.10), putting Un = J3£=1 u*- Equation (7.29) implies that {Un} is bounded, say \Un\ < C. By summation by parts (7.10), < Y^ r f± - l \ <L ~ 2-* \n* (n+l)') + N' + This expression tends to 0 as M and N tend to oo. Therefore the series (7.26) is convergent for 5 real and positive. By Proposition (7.2a), the series is convergent for Re s > 0, and the sum is analytic in this region. For (c), let Re s > 1. From (7.27) with X = Xo, we have l(s}Xo)=n —p pirn 1 Vs Using (7.3), we obtain (7.28). The remaining statements in (c) follow from Proposition 7.3, since the product over p with p \ m is a finite product. ST X(n) n=M C _ 2C
4. PROOF OF DIRICHLET'S THEOREM 203 By Proposition 7.10b, L(sy \) is well defined at s = 1 if x is not principal. The main step in the proof of Dirichlet's Theorem is the following lemma. We defer the proof of the lemma until we have shown how the lemma implies the theorem. Lemma 7.11. L(l, x) ^ 0 if x is not principal. Proof of Theorem 7.1. First we show for each Dirichlet character X modulo m that logL(s>X)= £ *&+9(s,x) p prime (7.30) for real s > 1, where g(s,x) remains bounded as s J 1. In this statement we have not yet specified a branch of the logarithm, and we shall choose it presently. Fix p and define, for $ > 1, a value of the logarithm of the pth factor of (7.27) by log 1 Since x(p) ps x(p) x(p) i*(p2) ix(p3h ■ + -———r-r—r-—I"' x(p) p« 2 p 2» 3 P ,35 +g(s,P,X). (7-31) < -, this logarithm satisfies log- < \z\ sup \w\<\z\ = \z\ sup M<l*l A. dw 1 1 — w < 2|*|2 - 1 for \z\ < i. With z = *-^, we therefore have lff(s,P.X)l<2 X(P) V 4 SiDce EpprimeP 2 ^ E^=i n 2 < °°> the ***** £P0(5>P>x) is uniformly convergent for s > 1. Let g($,x) he the continuous function ^2pg(SjPyx)' Summing (7.31) over primes p, we obtain X> i i- x(p) p' P
204 VII: DIRICHLET'S THEOREM By (7.27) the left side represents a branch of ]og L(s,x)- This proves (7.30). Define a function 6& on the positive integers by c / \ _ f 1 \f n = b mod m b^ ' ~ \ 0 otherwise. Proposition 7.9 gives *<») = ^jE*<»)x(»). Multiplying (7.30) by *(6) and summing on x, we obtain ^(m) £ i = £^log^x)-£^(*,x). (7-32) p prime, p=km+b The term £x x(%(*> x) is bounded as s j 1, by (7.30). The term Xo(^)logL(5,xo) is unbounded as s J 1, by Proposition 7.10c. For x nonprincipal, the term x(&)l°g£(s,x) ls bounded as s [ 1, by Proposition 7.10b and Lemma 7.11. Therefore the left side of (7.32) is unbounded as s [ 1. Hence the number of primes contributing to the sum is infinite. Proof of Lemma 7.11. Let Z(s) = [~[x £(s>x)- Exactly one factor of Z(s) has a pole at $ = 1, according to Proposition 7.10. If any factor has a zero at s = 1, then Z(s) is analytic for Re s > 0. Assuming that Z(s) is indeed analytic, we shall derive a contradiction. Being the finite product of absolutely convergent Dirichlet series for Re s > 1, Z(s) is given by an absolutely convergent Dirichlet series. We shall prove that the coefficients of this series are > 0. More precisely we shall prove for Re s > 1 that *<•> = II t \-r^ • <7-33) where f(p) is the order of p in Z£ and where g(p) = <p(m)/f(p). The factor (l-p-'W')-1 is given by a Dirichlet series with all coefficients > 0. Hence so is the g(p)th power, and so is the product over p of the result. Thus (7.33) will prove that all coefficients of Z(s) are > 0.
4. PROOF OF DIRICHLETS THEOREM 205 To prove (7.33), we write, for Re s > 1, ( Z(s) = l[L(s,X) = 1[[ » v 1 _ : „„. x V' ) \ f J Fix p not dividing m. We shall show that ?,.sa "S ?,.W n(>-^)-(-£)'• where / is the order of p in Z£ and where g = <p(m)/f; then (7.33) will follow. Now x —* x(p) is a homomorphism of Z£ into 5*1, hence into {e2*%kM} and onto some cyclic subgroup {e2wikH } with /' dividing /. We show f = /. In fact if/' < /, then pf> ^ 1 mod m, while xip1') = x{p)f' = 1 for all x> since x(j^ ) = x(l) for all x, the x's cannot span all functions on Z£, in contradiction to Proposition 7.9. Thus x —f *(p) is on^o {e25r**^}. In other words, xQ?) takes on all /th roots of unity as values, and the homomorphism property ensures that each is taken on the same number of times, namely g = <p{m)/f times. If X is an indeterminate, we then have H(l-X(p)X)= (f[(l-e**ik"X)) =(1-*')'. X \*=0 / Then (7.34) follows and so does (7.33). Hence all the coefficients of the Dirichlet series of Z(s) are > 0. Let us write Z(s) = Yl^Li ~ anc* Put s0 = inf{s > 0 | 2J ~7 converges}. 1 n n = l Then so < 1. Suppose so > 0. Since ^2ann~s converges uniformly on compact sets for Re s > so (by Proposition 7.2a), we can compute its derivative term by term. Thus *<%. +I). £-<;*■>''■ (7.35)
206 VII: DIRICHLET'S THEOREM The Taylor series of Z(s) about so + 1 is N=0 and is convergent at s = ^$o since Z(s) is analytic in the open disc centered at so + 1 and having radius so + 1. Thus *(**») = £ T^1 + ^o)N(-l)NZ(NHso + 1), with the series convergent. Substituting from (7.35), we have 7V=0n = l This is a series with positive terms, and Fubini's Theorem allows us to interchange sums and obtain n=lAT=0 v-^ ^ Qn^og^; l1 -i- 2^ ^L--Oogri)(l + ijo) oo In other words, the assumption $o > 0 led to a point between 0 and 5o (namely ^so) where there is convergence. This contradiction proves that 5o = 0. Therefore Y^ann'a converges for Re s > 0. Since the coefficients are positive, the convergence is absolute for s real and positive. By Proposition 7.2b the convergence is absolute for Re 5 > 0. Therefore the Euler product expansion (7.33) is valid for Re 5 > 0. For p \ m and for real s > 0, we have ('-£) L_ = (i+p-/» + p-v. + ...)i > i+P-Js' + P~Vs' + ■■■ = 1 + p-rt"1)' + p-Mm)« + .. . 1 pV(m)s
5. ANALYTIC PROPERTIES OF DIRICHLET L FUNCTIONS 207 Therefore J[L(s,X) n p\m 1 — 1 > n (i+^(m),+p-Mm),+--)=f:^)7- p prime n=l The sum on the right is -foo for s = —.—-, while the left side is finite. <p(m) This contradiction completes the proof of the lemma. 5. Analytic Properties of Dirichlet L Functions The Riemann zeta function £(s) and the Dirichlet L functions L(s,x) encode arithmetic information, and Dirichlet's Theorem comes out as a consequence of properties of these functions. The properties in question are the analytic continuation and the behavior at s = 1 for £(s) and each L(s,x)- Other theorems about primes can be deduced from other properties of these functions; for example, the Prime Number Theorem about the asymptotic number of primes < N is a consequence of the nonvanishing of £(s) for Re s = 1. In the case of elliptic curves over Q, we shall introduce in a later chapter a zeta function and an L function that are defined by Euler products and encode geometric information, and deep properties of the elliptic curve come out (partly conjecturally) as a consequence of properties of these functions. This is part of a general pattern in algebraic number theory and algebraic geometry, that L functions are used to encode information prime by prime and that properties of these L functions are expected to yield deep insights into the original problem being studied. Let us refer generally to such arithmetic or geometric L functions as motivic, for reasons explained in the preface. The question is how to get an analytic handle on these functions in order to exploit their properties. For £(s) and L(s,x), we obtained enough information by direct arguments to prove Theorem 7.1. But for more subtle analytic properties, one uses an indirect approach. The key idea is to relate such a function to a suitable analytic function in the upper half plane with certain transformation properties, a so-called "modular form." Each modular form has an L function, defined explicitly by a Mellin transform and given as a Dirichlet series. We shall refer generally to this kind of L function as
208 VII: DIRICHLETS THEOREM automorphic. Automorphic L functions have more manageable analytic properties, but they initially have little to do with algebraic number theory or algebraic geometry. The fundamental objective is to prove that motivic L functions are automorphic. In the case of £(s) and L(s,x)> we shall achieve this objective in this section, using suitable theta functions as the modular forms. The analytic properties that we shall derive as a consequence are their mero- morphic continuations to all of C, the functional equations that they satisfy, and the identifications of their poles. In the case of the L function of an elliptic curve over Q, the assertion that this L function is always automorphic is substantially the Taniyama-Weil Conjecture. This theme will occupy us for the remaining chapters of the book. If the L function is automorphic in the expected way, it extends to an entire function and satisfies a functional equation. In particular, this L(s) must extend to be defined in a neighborhood of s = 1. As mentioned in Chapter I, the Birch and Swinnerton-Dyer Conjecture relates the behavior of L(s) near s = 1 to the rank of the group of Q points of the elliptic curve, among other things. We begin by treating C(5)> returning to L(s,x) afterward. The theta function that we shall use is oo oo 0(t) = ^ ein*rT = 1 + 2 J2 c,n2'T, (7.36) n = —oo n=l analytic for Im r > 0. Its role in helping us to understand £(s) comes from the identity OO rOO e-n2*v±*-i da = / c-*(n2x)-(i--i)xi--i(n2x)-i dx Jo = n-T(£«)ir-*\ (7.37) Summing on n for n > 1 and interchanging sum and integral by Fubini's Theorem, we obtain roo C(«)r(£*)x-*' = / I[0(tV) - I]**5"1 da (7.38) for Re s > 1. In the terminology that we shall use in later chapters, C(s) is, except for harmless factors, the "Mellin transform" of 0(icr) — 1, and 0(t) is a modular form of "level" 2 and "weight" \. Except for a growth condition, these modular form properties are the content of the following proposition.
5. ANALYTIC PROPERTIES OF DIRICHLET L FUNCTIONS 209 Theorem 7.12. The analytic function 0(r), defined for Im r > 0, satisfies (a) 0(r + 2) = 0(r) (level 2) (b) 0(-l/T) = {T/i)V*0(r), (weight I) where the square root is the principal value cut on the negative real axis. Before proving this proposition, let us derive some analytic consequences for C(s). Corollary 7.13. The Riemann zeta function C(5)» initially defined by (7.1) for Re s > 1, extends to be meromorphic in C. Its only pole is at s = 1, and that pole is simple. Moreover A(«) = C(«)r(i«)*-** (7.39) satisfies the functional equation A(l-«) = A(«). (7.40) Remark. Notice that if C(s) is defined only for Re s > 1, then A(l — s) and A(s) have disjoint domains and (7.40) has no content. It is necessary first to extend £(s) to Re s > 0. Then (7.40) is meaningful for 0 < Re s < 1 and extends to an identity on C. The extension of C(s) to Re s > 0 has already been carried out in Proposition 7.3, but a little care is required to interpret (7.38) as an identity in this whole region, not just in Re s > 1. Proof. For a > 1, l[0(ia) - 1] = £ e-»a" < J2 *-k" = 7^7 < (1 " O-1*-". n=l *=1 l and thus \[0{ia) - \)ct^-1 dcr (7.41) converges for all s € C and defines an entire function. For real s > 1, we rewrite (7.38) as A(«)= / \0(i<r)<rt*-ld<r-\ f a^"1 d<r+ j \[0(ia)- l]**-1 da. Jo Jo J\ ,
210 VII: DIRICHLETS THEOREM On the right side the second term equals -1/s. The first term, by Theorem 7.12b, is The change of variables a —► 1/(7 shows that this is /OO 1 i[*(,-„)-i]«r4o->-irf(r__i_. Therefore A(«)= J™ \[0(i<T)-\]a±s-ld<r + J™ \[0{i<T)-l]<T^X-S^ld<T The first two terms extend to be entire, by (7.41), and the other two terms extend to be meromorphic in C with poles at s = 0 and s = 1, both simple. Therefore A(s) extends to be meromorphic in C with poles only at s = 0 and 5=1. Since (7.42) is unchanged under s —► 1 — s, (7.40) follows. From (7.39) and the meromorphic nature of A(s), we see that C(s) is meromorphic in C. Since T(^s) is nowhere vanishing, C(s) can have poles only at s = 0 and 5=1. Since T(^s) and A(s) both have simple poles at s = 0, C(s) is in fact analytic at s = 0. We turn to the proof of Theorem 7.12. The tool is the Poisson Summation Formula on R1, which is a result about the Fourier transform. The Fourier transform on R1 is the operator on Lebesgue integrable functions given by /(«)= / e-2*ituf(t)dt. J — OO If also / is continuous and / is integrable, then the Fourier inversion formula says /(*)= f°° e2rituf(u)du. J — OO (7.42) _1 1_ 1 — s s
5. ANALYTIC PROPERTIES OF DIRICHLET L FUNCTIONS 211 For our purposes a useful class of integrable functions is the space of Schwartz functions {I ,jt - is bounded for each "| f€C°°(Rl)\p(t)-^(t) integer * > 0 and L I each polynomial P ) The Fourier transform carries «S(RX) one-one onto «S(RX). Theorem 7.14 (Poisson Summation Formula). If/ is in ^(R1), then oo oo J2 f(x + n)= £ /(n)e2""*. n = —oo n=—oo Proof. Define F(x) = J2™=-oof(x + n)- From the definition of ^(R1), it is easy to check that this series is uniformly convergent and so is the series of kth derivatives, for each k. Consequently the function F is well defined and C°°, and it is periodic of period one. Such a function is the sum of its Fourier series: F(x)= ]T ( f F(t)e~2wint dt) e2Hnx. (7.43) n = -oo \J<> ' The Fourier coefficient in parenthesis above is / F(t)e-2*ini dt = ( J2 f{t + k)e-2*intdt Jq Jq Jb = -oo oo ri = E / f(t + k)e-2irintdt k = -ooJo = £ / f(t)e'2winidt k = -ooJk /oo f(t)e-2nint dt ■oo = /(")• (7.44) The theorem follows by substituting (7.44) into (7.43). For Theorem 7.12 the Poisson Summation Formula is to be applied to f(t) — e~wt and its dilates. In treating Dirichlet L functions, we shall apply the formula also to f(t) = te~Ti . The relevant Fourier transforms are given in the next proposition.
212 VII: DIRICHLET'S THEOREM Proposition 7.15. (a) (<r"2r= e-™7 (b) (te-wi9F=-iue-wu*. Proof. For (a) we form the line integral over a rectangle in C of the entire function e~*2 . The rectangle extends from —N to N in the x direction and from 0 to u in the y direction. The Cauchy Integral Theorem says that e~*x2 dx- e-*(*+.'u)2 dx + (contribution from each end) = 0. ■N J-N As N —► oo, the contribution from each end tends to 0, and we obtain /OO AOO e-"2dx= / c-*(x+''u>2 dx. -oo J—oo Multiplying by e~*u , we see that (e-T<2)» =€"Tu3 r e-**2dx. J — oo The integral on the right side is 1, as we see by evaluating its square /■^oo .Coo e~*^x +y ^dxdy in polar coordinates. This proves (a). Part (b) follows by differentiating both sides of the identity e-wt2e-2witudt = e ' — oo with respect to u. f J — < Proof of Theorem 7.12. Conclusion (a) is immediate from (7.36). For (b), let f(t) = e""3 and define fr(t) = f{r~H) for r > 0. Changing variables in the definition of Fourier transform, we immediately obtain /r(u) = r/(ru). (7.45) For our particular /, Proposition 7.15a therefore gives fr(t) = e~Tr"~V and fr(u) = re-*rV. Applying Theorem 7.14 to fr with x = 0, we find that e-«-a»'=r V" e-'rJ"J. n = — oo n = —oo With r = o"1/2 and (T > 0, this identity says .(.±)=,■/*,*). This is conclusion (b) for r = ia} and (b) follows by analytic continuation.
5. ANALYTIC PROPERTIES OF DIRICHLET L FUNCTIONS 213 The remainder of this section will give a parallel development for Dirichlet L functions. There is one complication, namely in isolating what Dirichlet characters to consider. For example, the L function for the principal Dirichlet character Xo modulo m has an Euler product that is the same as the one for £(s) except that the factors where p \ m are dropped. So we should be content with having the functional equation for C(s) yield a functional equation for L(s,Xo)- Here are two subtler examples. Example 1. Let m = 8 and define xO) = x(5) = 1 and x(3) = X(7) = —1. The resulting L function coincides with L(s,x')> where x' is given with m = 4 by x'(l) = 1 and x'(3) = —1- Example 2. Let m = 6 and define x(l) = 1 and x(5) = — 1. Also let m = 3 and define *'(1) = 1 and x'(2) = -1. Then Again one should be content with having the functional equation for one of these L functions yield the functional equation for the other. We say that two nonprincipal Dirichlet characters x modulo m and x' modulo m' are associate if x(p) = x'(p) f°r all but finitely many primes. This is an equivalence relation. The conductor of an equivalence class is the least integer m" such that the class contains a nonprincipal Dirichlet character modulo m"'. In Example 1, x an^ x' are associate with conductor 4; in Example 2, x and x' are associate with conductor 3. A Dirichlet character modulo m is primitive if its conductor is m. If X is a Dirichlet character modulo m, we can convert x into a Dirichlet character x* modulo am by defining y*M=(xW) ifGCD^) = 1 K KJ) \ 0 ifGCD(a,i)>l Let us call x* an extension of x- If X* IS an extension of x, then x and x* are associate. Suppose x and x' are Dirichlet characters modulo the same m that are associate. Let us prove that x = x'- In fact, suppose GCD(m,6) = 1. By Theorem 7.1 there are infinitely many primes p with p = b mod m. For all but finitely many of them, hence for at least one of them, we have x(p) = x'(p)- Therefore x(^) = x'(&)- If GCD(m,6) > 1, then X(ft) = *'(*) = 0. Hence x = x'. Consequently an equivalence class of associate Dirichlet characters contains a unique primitive Dirichlet character.
214 VII: DIRICHLET'S THEOREM Proposition 7.16. If Xm is a Dirichlet character modulo m and Xm1 is an associate Dirichlet character modulo m', then there exists a Dirichlet character xgcd modulo GCD(m,m/) such that Xm and Xm' are extensions of Xgcd- a' Proof. Write m — Y\p*j and m! = ]"[?/• Put n,- = min(aj,a^) and Nj = max(a;,aj). By the Chinese Remainder Theorem, Zm = 0Z «> as rings, and thus Z£ ^ 0Zx«i. The proof of Proposition 7.9 then shows that (Z£y » ©(Z*., )~ Let us write Xm = (• • • > XPV .■■■)• The extension of Xm to Z£CM,m m,x is accomplished by extending each xpv to 1XN.. Thus we have Xm — v • • i X «j j • • • J* We argue similarly with Xm;» obtaining a similar formula. Since Xm and Xm' are associate, so are Xm and Xm" Then xm = Xm/ since both Dirichlet characters occur modulo LCM(m,m/). This equality implies X#«, = X#«' f°r all j. In other words, x "> = X#«>- Putting Xgcd = P; P.J pi p> (..., x n>, • • •), we see that Xm and Xm' are both extensions of xgcd- pi Corollary 7.17. Any Dirichlet character modulo m is an extension of the unique primitive Dirichlet character x' with which it is associate. Consequently the conductor of x is a divisor of m. Proof. We apply Proposition 7.16 to x and x'- Then x and \' are both extensions of xgcd- Since x' is primitive, x' = Xgcd- Thus x is an extension of Xgcd- It follows that any L{s, x) can be obtained by deleting finitely many factors from the Euler product of some L(s,x') with \' primitive. For purposes of deriving functional equations, it is therefore enough to treat the primitive case. The way that "primitive" will enter the argument is through the following lemma. Lemma 7.18. If x is a primitive Dirichlet character modulo m and if c(m,x) denotes the Gauss sum m-l c("»,x)=I>a''*/mx(*), (7.46) *=o
5. ANALYTIC PROPERTIES OF DIRICHLET L FUNCTIONS 215 then m-l Yl e2"ink/mx(k) = ^Wc(m, X) (7.47) *=o for every integer n. Remark. It is easy to check that the nonprimitive \ in Example 2 above does not satisfy (7.47) for n = 2. Proof. If GCD(n,m) = 1, then m—1 m —1 E e2rtBt/mx(*)x(n) = 2 e2'tofc/"*x(*n) Jb = 0 Jk=0 m-l = J]e2*,"*'mx(*)=c(mlx). *=o Letting n"1 denote an inverse of n in Zm, we multiply through by x(n_1) = x(n) and obtain (7.47). If d = GCD(n, m) > 1, we are to prove that the left side of (7.47) is 0. Let m! = m/d. The natural map Zm —♦ Zm, has kernel K = {aGZm |a = lmodm'}. If xl/f were to equal 1, then \ would descend to Zm/ and define a Dirichlet character \' modulo m' such that \ = x'#- Since x is primitive, this descent cannot happen, and we conclude that x\k is a nontrivial character of K. The left side of (7.47) is m'-l e2*ik'm'x(k) m-l = E< Jb = 0 m;-l r=0 ,2-Kikfm 2irir/m 'xW = m'-l E r=0 '' E *(*) E 0<Jb<m-l fc=rmodm' m'-l = E c2'<r/m'x(»-) E x(°)- Since x|/r is nontrivial, the inner sum is 0. Thus (7.47) is 0.
216 VII: DIRICHLETS THEOREM Theorem 7.19. Let \ be a primitive Dirichlet character modulo m, and define 0(r, \) for Im r > 0 as { E?=-oo x(n)e'n''T/m = 2£~=i x(n)ein2"/m if X(-l) = 1 HL-co X(«)nc-3"/- = 2 £~=1 x(n)nein*rT,m if x(-l) = "1 Then (a) 0(r + 2m,x) = 0(r,x) (b) tf(-l/r,X) = &jj£(T/i)W0(T,x) if X(-1) = 1 (b') *(-l/r,X) = ~ic{pX\r/i)3f2e(r,x) if x(-l) = -1 The Dirichlet L functions L(s,x) and £(5>x) extend to be entire in s and have the following corresponding properties: (c) If x(-l) = 1, then A(s,X) = L{8,x)rn*T{±8)ir-ia satisfies A(*,X) = ^^A(l-5,x) (7.48a) (c') If x(-l) = -1, then A(«lX) = L(s,x)™*('+1)r(|(s + l))ir-i('+1> satisfies A(s)X) = -M^) A(1 _ 5>x). (7.48b) Remarks. Applying (c) or (c') first to x and then to x'> we see that c(m,x)c{myx) = mx(-l). We readily check from the definition that c(™,x) = X(-l)c(m,x). Consequently the coefficient c(myx)/y/m in the functional equation (7.48) has absolute value one. Proof. Result (a) is clear. For (b), define /(«) = ('"*'* ifX<-1)=1 (7.49) \le—' ifx(-l) = -l '
5. ANALYTIC PROPERTIES OF DIRICHLET L FUNCTIONS 217 The Poisson Summation Formula (Theorem 7.14) applied to / with x = k/m gives E '(£ + ")= E /(n)e2"nt/m n=—oo ^ ' n = — oo Multiplying by x(ib), summing on k from 0 to m — 1, and using (7.47) yields oo m—1 • , v oo m —1 E E x(*)/ (£+»)= E /(») E e2"nt/mx(*) n=-oo jfc=0 ^ ' n = -oo Jb=0 oo = c(n».x) E /(")*(")• (7.50) The left side of (7.50) is oo m— 1 = £ j;*(*+m,,)/(i±pn) n = -oo k=0 X ' -f:^(=)- Jb=—oo N Applying (7.45) to the resulting form of (7.50), we have E X(t)/f—)=c(mlX)r E /(™)x(n). (7.51) fc = -oo ^ ' n = -oo If x(—1) = 1, we choose / as in (7.49) and put r = \J<r/m\ application of Proposition 7.15a yields E X(fc)e-*2""1/m = C-^T=T°112 E xMe-n3"/m- fc=-oo ^ n = -oo This is (b) for r = ur, and (b) for general r follows by analytic continuation. If x(—1) = —li we choose / as in (7.49) and again put r = yjajm\ application of Proposition 7.15b yields E x(*) Jfc = -oo
218 VII: DIRICHLET'S THEOREM This is (b;) for r = ia, and (b;) for general r follows by analytic continuation. For (c), suppose x(—1) = 1- In analogy with (7.37), we have da A(',x) = f;^m*T(l5)ir-4' Jb=l = J" \B{i<TiX)^a-1 da (7.52) for Re 5 > 1. The formal argument, disregarding convergence, is to replace a by a~l and then apply (b). The above expression becomes = f^ \0{i/a)a-^-1 da Jo = *^>A(1-,,*) by (7.52). To make this argument precise, we observe by the same argument as for (7.41) that / ^(ia.x)^9'1 da (7.53) converges for all s £ C and defines an entire function. For Re s > 1, we rewrite (7.52) as A(«,X) = J ^{i^x^'-'da + J00 \0{i<T,X)<r*'-'d<j and transform just the first term, replacing a by a"1 and applying (b). The result is A(«,x) =C-^=fJ" ^(.V,*)^1-)-1 de + j~ \9(ia,X)*^ da. (7.54)
5. ANALYTIC PROPERTIES OF DIRICHLET L FUNCTIONS 219 In view of (7.53), both terms extend to entire functions of sy and therefore A(s, x) extends to be entire in s. Using (7.54) with x} we have g(m>x) y/in A(1 _ .,*) = ^M f I^V,^'-1 da vm ^ (7.55) To prove (c), we need to show that the right sides of (7.54) and (7.55) are equal, and this comes down to proving that cim.xyim.x) = mx(-l) (7.56) (as was asserted in the remarks before the proof). To prove (7.56), we write m-l c(m,X)c(m,x) = £ e2*ik'mX(k)c(m,x) Jb=0 m —1 m— 1 = ^ c2«*/m ^ e2^»H/m^yy by Lemma 7 lg Jfc=0 /=0 m—1 m—1 = V^ e2wik/m V^ e-2wikl/m / j\ Jb=0 /=0 m —1 m —1 1=0 k=0 m-l = X(~"l) 5Z *(0^Um by Proposition 7.9 /=o = rnx(-l). This proves (7.56). The analyticity of L(s,x) is obtained from that of A(s, x) by the same argument used with £(s) and A(s) in Corollary 7.13, and (b) follows. For (c'), suppose x(_l) = — 1- In analogy with (7.37) and (7.52), we
220 VII: DIRICHLET'S THEOREM have = / ^x(*)*c-*2'"/mo-i('+1)-1<fo- Jo fc=i = I" \e(i<Ttx)°*{l+l)-1 to (7-57) for Re s > 1. The formal argument, disregarding convergence, is to replace a by <r~l and then apply (b'). The above expression becomes = f° $*(«•/*, x)*-*(f+1)_1<fc- .70 _ -ic(m,x) y/rn ■A(l-«,x) by (7.57). This argument is made precise in the same way as for (c), by means of (7.56). The analyticity of L(s,x) is obtained by the same argument used for £(s) in Corollary 7.13, and (c') follows.
CHAPTER VIII MODULAR FORMS FOR S£(2,Z) 1. Overview In Chapter VI we established a correspondence A <-► E of lattices in C with elliptic curves defined over C. By way of introduction to the subject of modular forms, we shall now let the lattice vary. Then G4, Gq, and A lead to analytic functions on the upper half plane with special transformation properties (under the group SL(2, Z)) and certain growth conditions that we list in §2. Analytic functions of this kind will be defined in §2 to be modular forms. Cusp forms will be modular forms with an additional vanishing property, and A will be an example. To each cusp form, we shall associate in §3 an L function by means of a Mellin transform. This L function will be given by a Dirichlet series and will extend to be entire on C and to satisfy a functional equation. The proof will be much like that for the Dirichlet L functions in Theorem 7.19, and the functional equation will be of the type in (7.48). The remainder of the chapter will introduce and use Hecke operators as a tool for expanding the L functions of selected cusp forms as Euler products. In Chapter IX we shall extend the theory of Chapter VIII to modular forms and cusp forms whose transformation properties are relative to certain special subgroups of SL(2,Z). After that point, we return to elliptic curves. For an elliptic curve with integer coefficients, we count the number of solutions for each finite field and assemble this information into an L function for the elliptic curve, which is given as an Euler product and is hard to use. The result is that we have two kinds of L functions, the kind from cusp forms that we understand very well and the kind from elliptic curves that contains a great deal of information. Eichler-Shimura theory observes that cusp forms with special properties can be used to parametrize the points of elliptic curves over Q. Moreover, the L function of the cusp form coincides with the L function of the elliptic curve. Elliptic curves that can be parametrized in this way are called modular. The Taniyama-Weil Conjecture is the assertion that every elliptic curve is modular. There are many equivalent formulations, and there is an algorithm for deciding the conjecture for any particular curve. 221
222 VIII: MODULAR FORMS FOR 5L(2,Z) If the conjecture is true, then restrictions on the nature of modular forms can be brought to bear on the theory of elliptic curves. A dramatic example of this process is the theorem that the Taniyama-Weil Conjecture implies Fermat's Last Theorem. A counterexample to Fermat *s Last Theorem would yield an elliptic curve with remarkable properties, and one proves with some effort that such an elliptic curve cannot be modular. 2. Definitions and Examples Let us recall the correspondence A <-> E of lattices in C with elliptic curves defined over C. In (6.4) and (6.5) we defined G2fc(A) = Yl ^2k for * > 2 (72(A) = 60G4(A) (/3(A) = 140G6(A). (8.1) According to (3.30), the curve y2 = 4x3 + 62x2 + 264z + 66 has A = -b2b8 - 86s, - 276§ + 9626466. Therefore the curve y2 = 4x3 - g2{A)x - g3(A) of Chapter VI has A(A) = (72(A)3-27(73(A)2. (8.2) Similarly the j invariant is given by j(A) = 1728ff2(A)3/A(A). (8.3) The dependence on A can be simplified a little. If a is in Cx, we can compute <*»(«A) = £ ^ = £ 7^n = «-»G»(A) wGorA u,;eA V "*0 u,V0
2. DEFINITIONS AND EXAMPLES 223 and similarly A(aA) = a"12A(A) i(orA) = j(A). Let A be generated by u>i and u>2, so that A = Zu>i 0 Zu^. Possibly by interchanging u>\ and u>2, we may assume that Im(o;2/wi) > 0. Introducing t = U2/v\ and Ar = Z 0 Zr, we have A = Wl(zez(^)) =WlAr. Thus G2*, A, and j are determined by their effects on A = AT. We define G2k(r) = G3t(AT), A(r) = A(Ar), j(r) = j(AT). (8.4) We can make the same computation with a different basis {u^, u>2} for A. We have (:?)-(: i)C) with a, 6, c, (f in Z. Invertibility of this relation over Z implies ad — be = ±1. Let us see the effect on the associated Ar. We may as well start with {u>i, u>2} = {1, r}. Then {u;j, lj2} is given by tt)-(::)(0 ■(-::)■ and the associated t' is i _ ar + & "" cr + d" Since Im r > 0 and Im r' > 0, we must have ad — be = -hi, not —1. We are thus led to consider the action of SL(2,R) on the upper half plane and the special role of the subgroup SX(2, Z). The group 5L(2, R) of 2-by-2 real matrices of determinant one acts on the upper half plane 7i = {Im r > 0} in the usual way by linear fractional transformations, with g — ( , J acting by gr = -. \c a) cr-ha Let 5L(2,Z) be the subgroup with integer entries. What we saw above for an element 7 = ( , ) of SX(2,Z) was that AT D (ct -h d)Ar/, hence that Ar = (ct -h d)ATi. Therefore A7T = (cr + d)"1Ar.
224 VIII: MODULAR FORMS FOR SL(2, Z) The resulting transformation laws for the functions we have been considering are as follows: G"^= £ rmTl„x2t G2k(7T) = {cr + d)2kGMT) (m>ntr(0,0)(nlT + n)2t g2(r) = 60G4(r) g2(yT) = {cr + d)4g2(r) g3(T) = 140G6(r) g3(ir) = (cr + dfg3(r) A(r) = g2(r)3 - 27g3(r)2 A(yr) = (cr + d)12A(r) j(r) = 172802 (r)3/A(r) j(Tr) = j(r). Fix an integer fc. An analytic function on H that satisfies /(7r) = (cr + d)*/(r) for all 7 = (° J) € 51(2, Z) (8.5) is called an unrestricted modular form of weight k (for the full modular group SL(2, Z)). To have f =fc 0} k must be even. We shall drop the word "unrestricted" if an additional condition below is satisfied. Each unrestricted modular form / has a "q expansion" as follows. Taking 7 = ( J in (8.5), we see that f(r) = f{r + 1). Putting r — p + i(T, we expand in Fourier series in the p variable. Since / is smooth, 00 00 /(r)= £ «n(<r)e2*ifin = J2 ""(^e2""'2™'. n= —00 n= —00 where Then an(<r)= /' f(p + i<r)e-2"inrdp. •'-■3 n(<7)e2Tmy = / /(/> + .V)c-2ir,"n<>+,'*> dp. (8.6) J — 4 Imagine a rectangle in the r plane extending in the horizontal direction from p = — £ to H-^ and in the vertical direction from <j\ — a to some (T2. The integral (8.6) then represents the bottom portion of the line integral ' f{r)e-2*inT dr f-
2. DEFINITIONS AND EXAMPLES 225 over this rectangle. The total line integral is 0, by the Cauchy Integral Theorem, and the sides cancel because f(r) = /(r+1). Thus the bottom portion equals the top portion when they are oriented the same way. In other words, (8.6) is a constant as a function of a, say constantly equal to cn. We conclude that oo /(r)= £ cngn with, = e2*'T. (8.7a) n = —oo Here cn= f(r)e-2*inT dp for any a > 0. (8.7b) Expression (8.7) is called the q expansion of/, or, for reasons discussed in §3, the expansion of / at oo. We say that an unrestricted modular form / is holomorphic at oo and is a modular form if its q expansion has cn = 0 for n < 0. If also c0 = 0, we call / a cusp form- In §3 we shall give a geometric interpretation for these conditions. But first we consider our standard examples G2*, A, and j. Proposition 8.1. For k > 2, the q expansion of G2jb(T) is given by Gn(r) = 2<(2*) + ||^ f] <r2fc_,(n),", where <T\{ri) — ^2d\n dl. Consequently G2k(T) is a modular form of weight 2k. Proof. We take as known the identity 1 °° / 1 1 \ 7T COt 7TT = - + 5[ + ) , (8.8) t ^Kr + m T-mJ* v } in which the convergence is uniform on compact sets. With q = e2*lT and Im r > 0 (so that \q\ < 1), we have cos itt . q + 1 2wi _ . -c-^ d IT COt 7TT = 7T — = ITT 7 = 27T — = I7T — 27TI > flf . Sin 7TT flf — 1 1 — flf *—' Thus (8.8) gives
226 VIII: MODULAR FORMS FOR SL(2, Z) Differentiating 2k — 1 times gives Hence G2t(r) = 2 1 d=l (m,n)*(0,0) V > ^ (m)2* + ^ ^ (nr + m)2fc = 2C(2*) + 2]T J2 1 ' ^-f ^-f (nr + m)2* Applying (8.9) with r replaced by nr, we see that this expression is =2«2t)+2 (2rri)i(2"r* £ £y*"V" as required. REMARK. The next-to-last line of the proof shows that also The series here is rapidly convergent and allows for approximate numerical calculations. If we replace r with u^/t^i and sort matters out, we are led to the formulas (6.50) for gi and g$ that were used for the calculations in §VI.9. Corollary 8.2. A(r) is a cusp form of weight 12 and is nonvanishing on 7i. Also j(r) is an unrestricted modular form of weight 0 with q expansion 1 °° j(r)=-+ 744+£cn*n. q n=l
3. GEOMETRY OF THE q EXPANSION 227 Proof. The nonvanishing of A(r) follows from Theorem 6.15. From (8.2) and (8.1), we have A(r) = g2{r)3 - 27g3(r)2 = 603O,(r)3 - 27 • 1402G6(r)2. Taking as known the identities ft i ^,«v ft C(4) = — and <(6) = 90 w 33-5-7' we obtain from Proposition 8.1 G4{r) = ^ + ^J-(1 + 9«2 + 28g3 + 73/ + ...) Gs(r) = -^7 ~ ^0"(* + 33?2 + 244q3 + m7q4 + • • •}- Therefore A(r) = (2*)12(9 - 24«/2 + 252q3 - 1472/ + ...), (8.11) and A(t) is a cusp form. For the q expansion of j(r), we have 1728 • 603G4(r)3 j(r) = as required. A(r) _ 1 -I- 720g + 179280g2 + 16954560g3 + ... q - 2V -I- 252g3 - 1472?4 + ... = - + 744 4- 196884? + 21493760?2 + ... (8.12) Q 3. Geometry of the q Expansion Armed with examples, let us discuss the geometry connected with the q expansion. Let R be the closed subset of W described in Figure 8.1. Proposition 8.5 below addresses the fact that R is a fundamental domain for the action of SX(2,Z) in Ji. This fact needs a little care in its formulation, since ( ~ _x J acts as the identity on H. The group 5L(2, R)/ { i ( o i J r acts effectively, and R is really a fundamental domain for the group = SL(2,1)/{±(H)}
228 VIII: MODULAR FORMS FOR 5L(2,Z) Let T and S be the images in T of the members f 1 J and [ - n 1 of 5L(2, Z). These elements are given by T(r) = r+1 and 5(r) = -l/r, (8.13) and S has order 2 (not 4). Before proving that R is a fundamental domain, we examine these elements more closely. -1/2 1/2 Figure 8.1. Fundamental domain for SL(2tl) Proposition 8.3. The elements ( n J and I n 1 generate 5L(2,Z). Proof. Assume the contrary. Let f be the subgroup of 51(2, Z) generated by f J and I , n J. Among all members f , J of 5L(2,Z) not in f, choose one with max{|a|, \c\} as small as possible. Then with max{|a|,|c|} fixed, we may assume min{|a|,|c|} is as small as possible. Multiplying if necessary by ( 1 J on the left, we may assume that \a\ < \c\. Now (-•.i)(i!)(ijr-(i!)
3. GEOMETRY OF THE q EXPANSION 229 shows that ( 1 1 is in f. Thus (I 0\(a b\( a b \ \±l l) \c d) \c±a d±b) is not in f. Unless a = 0, one of c ± a has \c ± a\ < \c\ and contradicts our construction of ( , 1. Thus we may assume a = 0. Multiplying on the left by I . J or its inverse f J, we see that 5L(2,Z) contains an element f n J not in f. This element is a power of f n 1 J, and we have a contradiction. Since / 1 and I . n 1 are in 5£(2,Z), an unrestricted modular form / of weight k satisfies /(-l/') = (-r)V(r). On the other hand, Proposition 8.3 implies the following converse to the validity of (8.14). Corollary 8.4. An analytic function / on H that satisfies (8.14) is an unrestricted modular form. Proof. For g = ( a * J in 5L(2,R), let 6(gyT) = cr + d. (8.15) Direct computation gives %i02,r) = 6(gug2r)6(g27T) (8.16a) and %-1,r) = %lfl-1r)-1. (8.16b) If (/i and 02 are in SX(2, Z) and are elements y such that f(jr)=6{y,T)kf(T), (8.17)
230 Vni: MODULAR FORMS FOR 5L(2,Z) then (8.16a) says that (8.17) holds also for 7 = g\g2- If <7 is an element 7 such that (8.17) holds, then (8.16b) says that (8.17) holds also for 7 = g~l. Thus the subset of 5L(2, Z) for which (8.17) holds is a subgroup. Equations (8.14) say that this subgroup contains I n .1 and (-0. i> By Proposition 8.3 the subgroup is all of SX(2, Z). Theorem 8.5. (a) Each point of H can be mapped into R by some element of T = si(2,z)/{± (;;)}. (b) The only points of R that are equivalent with one another under T are the points r and r + 1 of the vertical sides, and the points r and — 1/r of the circular arc. (c) The only points of R that are fixed by a member 7 ^ 1 of T are r = i (fixed exactly by the subgroup {1,5}), r = p = e2"/3 (fixed exactly by the subgroup {1,5T,(5T)2}), and r = -p = c*1"'3 (fixed exactly by the subgroup {1,T5, (TS)2}). Proof, (a) If r = p + icr is given, then we can compute that Im —T^ = 1 , ,112 • (818 Since c and d are integers, some choice of I ,1 makes \ct -f d\2 a minimum. Applying this element, we obtain r' such that Im r' > Im 77' for all 7 € T. For a suitable choice of n, the translate r" — T^r' will have |Re r"\ < \> and it will still be true that Im r" > Im jr" for all 7 E T. If t" were strictly below the unit circle (at the bottom edge of R)} we would have Im r" < Im St", contradiction. We conclude that t" is in R. (b,c) Suppose r is in R and g = ( , 1 is an element of 5L(2, Z) with gT in i2. Let 7 be the image of g in T. Since jt and 7-1(7r) are in R> there is no loss of generality in assuming that Im(7r) > Im r. By (8.18), \ct + d\ < 1. Since Im t > }, we cannot have \c\ > 2. Thus c = 0, 1, or — 1. If c = 0, then 7 must be a power of T, and the points in question are r and r ± 1 as in (b). Since f , J and f _ . J have the same image in T, it remains to consider c = 1. Then \t + d\ < 1. Since |Re t\ < ^, we must have
4. DIMENSIONS OF SPACES OF MODULAR FORMS 231 \d\ < 1. Thus d = 0, 1, or -1. Suppose d = 0. Then |r| = 1. Since ad — be =1,6 = — 1. Hence g = ( n )' anc* 7r = a • This is an integer translate of the point , which is on the bottom edge of R T since \t\ = 1. The possibilities are that a = — 1 and r = p as in (c), or a = 1 and r = —p as in (c), or a = 0 as in (b) and the case r = i of (c). Suppose d — 1. Then r G R and |r + 1| < 1 imply t — p. In this case fa b\ *u a 1 c ap + a-1 1 . '=(,1 lj witha-6=LSoTP= p+1 =«-J^ = * + P- Hence a = 0 or 1. If a = 0, we have g = I . 1 as in (c). If a = 1, then the points are p and p + 1 as in (b). Suppose d = — 1. Then similarly r = —p and 7(—p) = a 4- (—p) with a = 0 or — 1. If a = 0, we obtain g = ( 1 as in (c). If a = 1, then the points are —p and (—p) + 1 as in (b). The fundamental domain allows us to interpret the q expansion of a modular form as an expansion at oo. In fact, we see from Figure 8.1 that the only way to tend to oo in R is vertically. Thus the point r = p + ia has a —► oo while p remains bounded. The effect on q = e2ntT is for q to tend to 0. So a power series in q may be regarded as an expansion about r = oo. If the qr expansion does not have infinitely many negative powers of q) the condition that an unrestricted modular form be modular is a growth condition: There is to be no pole, and the form is thus to be bounded as cr —► oo. The additional condition for a cusp form is that the form vanish as a —► oo. This condition can be checked by evaluating the integral defining cq in (8.7b). 4. Dimensions of Spaces of Modular Forms Let / ^ 0 be a modular form of weight k. For r in W, let vT(f) be the order of vanishing of / at r. If 7 is in SX(2, Z), then vyT(f) = vT(/) since f(yr) = (cr + d)kf(r). Let vOQ(f) be the order of vanishing of the q expansion of / (the expansion at 00). Theorem 8.6. If / ^ 0 is a modular form of weight Jb, then »„(/) + W/) + 5*v(/) + £' Mf) = 4 r where 52' refers to a sum over inequivalent points in the fundamental domain R other than 1 and p.
232 VIII: MODULAR FORMS FOR SL(2,Z) REMARK. The left side is indeed finite. There can be no zeros of / in a deleted neighborhood of oo, i.e., beyond some height in R. The remainder of R is compact, and / can have only finitely many zeros there. Essentially we shall apply the Argument Principle, computer-: V ,./ v dr over the boundary of R with positive orientation. 2*i J /(r) Proof ing Actually we have to adjust our contour, introducing detours to avoid zeros. First assume that there are no zeros on the boundary of R except possibly at t and p. We introduce the modified contour in Figure 8.2. It is assumed that all the zeros of / within R, other than those at i and />, are within the contour in the figure. The arcs BB't CC, and DD' are circular. l-'B DM Figure 8.2. Contour for calculating number of zeros We shall make repeated use of the following change of variables formula. If t = </>(t') is an analytic change of variables, then (f0<f>)'(T')dr' = f'(r)dr (/o^(r') /(r) • (8.19) For a first application of this formula, we take r' = q and write f(q) = /(r). As r traverses EAy q goes once around a circle with negative orientation. Hence 1 f f'(r)dr _ 1 ff'(g)dq_
4. DIMENSIONS OF SPACES OF MODULAR FORMS 233 Next the points of AB are paired with those of ED' by r —► r + 1, and /(r + 1) = /(r). Hence JLt f'^dr , l f f'^dr-o rs2n mJab f(r) + 2*iJD.B f(r) "u' (*'n) The arc B'C is paired with the arc DC by r —► r' = — 1/r. We take <j> = S in (8.19) and obtain / /'(r)«fr /• r(r)dr / (/o5)'(r)dr Jc-D f{r) JDC- f(r) JB,c (/oS)(t) = _ / £[(-*-)*/(r)]dr 7b'C (-r)*/(r) -_ / f(T)dr f *^i Hence J_ / Ar)rfr 1 / /'(r)ifr t / £ = * ofn 2*i y^c /(*0 2xi y^D /(r) 2xi 7^c r 12 ^ V ;' (8.22) where o(l) is a term that tends to 0 as the radii of the arcs BBfy CO, DD' tend to 0. To handle the integrals over the small arcs BBf, CO, and DD'y we note the following fact from complex variable theory. Let Hea mero- morphic function with a simple pole at z0 and with residue h-\ there. Over a small positively oriented circular arc centered at z0 and consuming a fraction 9 of the circumference of the circle, we have -^ / h{z)dz^eh_l^o{\). (8.23) *wt Jarc [In fact, we have only to expand h(z) in Laurent series and compute the integral term by term.] Applying (8.23) to the small arcs in Figure 8.2 and taking into account orientations, we have
234 VIII: MODULAR FORMS FOR SL(2,Z) We add these three formulas to the sum of (8.20), (8.21), and (8.22), and we apply the Argument Principle. The result is £ 'vT(f) = -««,(/) + 0+ ^ -jM/)-^(/)- \">(f) + "(1). r Letting the radii of the small arcs tend to 0, we obtain the desired result. We still have to treat the case that / has some zeros on the boundary of R other than at p and i. In this case we introduce a circular bubble at each such point, with S or T of that bubble at the congruent point on the boundary of R. The bubbles on the vertical sides of R do not affect the above argument, and the bubbles on the circular bottom side of R introduce error terms that are o(l). This completes the proof. Let Mk be the vector space of modular forms of weight jfc, and let Sk be the subspace of cusp forms. These spaces are 0 for k odd, since /W = /((-.I_°0')=(-l)V(r). For even k > 4, Proposition 8.1 shows that G* is in Mk but not S*. On the other hand, Sk has codimension at most 1 in Mk, being given by a single condition cq = 0. Thus Mk = Sk © CG* for k even > 4. (8.24) Corollary 8.7. Let k be even. (a) Mk = 0 for k < 0 and k = 2. (b) Mo = CI, and Mk = CG* for 4 < k < 10. (c) Multiplication by A(r) defines an isomorphism of Mfc-12 onto Sk- Proof, (a) In Theorem 8.6, all terms on the left side are > 0. Thus the right side is > 0, and k > 0. For k = 2, the right side is |. No nonnegative integer sum n\ + ^2 + \n$ can equal £, and so no / ^ 0 can exist. (b) When k < 12, the right side is < 1. Thus Voo(f) = 0. Hence Sk = 0. If 4 < k < 10 and k is even, Proposition 8.1 shows that Mk = CG*. If k = 0, 1 is in Mk. Since So has codimension at most 1 in M0, Mo = CI. (c) Since A(r) is a cusp form of weight 12, multiplication by A carries Mk-12 into Sk- On the other hand, A(r) is nonvanishing on H by Corollary 8.2 (or Theorem 6.15) and has a simple zero at 00, by (8.11). Thus multiplication by A(r)"1 is a well defined linear map of S* into
4. DIMENSIONS OF SPACES OF MODULAR FORMS 235 Corollary 8.8. If k > 0 is even, then \ \ — if fc = 2 mod 12. Also Sk = 0 for k < 12; if k > 12 is even, then f ^-1 if k £ 2 mod 12 - 1 if Jb = 2 mod 12. dims* = Corollary 8.9. The q expansion of the cusp form A(r) of weight 12 can be regrouped as A(r) = (2»)12«n(l-«r- n = l More specifically let i)(r) = e'"'" f[(l -qn), n = l so that A(r) = (27r)i2r;(r)24. Then rj(r + 1) = e^'12^) and ^-l/r) = (-ir)1/2^). Remark. This formula for A(r) lends itself much better to calculation than the one in (8.2). Proof. Once we have proved the two transformation laws for ??(r), it follows that rj(r)24 satisfies the transformation laws (8.14) with k — 12. Corollary 8.4 then shows that t){t)24 is an unrestricted modular form of weight 12, and hence r;(r)24 is a cusp form of weight 12. By Corollary 8.8, A(r) = ct)(r)24 for some constant c. The coefficient of q for t)(t)24 is 1 and for A(r) is (2*)12, by (8.11). Thus c = (2tt)12. The identity tj(t -f 1) = e*xll2T){r) being obvious, we are left with proving ^(-l/r) = (-,V)1/^(r). (8.25)
236 VIII: MODULAR FORMS FOR 5L(2, Z) Let log denote the principal branch of the logarithm cut on the negative reals. Since \q\ < 1, -Mi-,') = £!,". fc=l Then 77(r)e-Wr/12 = f[explog(l-^ oo = exp^log(l-tf') El qk "Tl * = cxpE-i(c-«^-i)- Replacing r by — 1/r gives ,(-l/r)eW(-)=exp|:-i(;i^r-T). So l(r)/ij(-l/r) __ zi(T+r-») v-^ 1 / 1 1 \ - c 12 exp ^ ^ ^.a,,.^ _ j c2,ifc/r _ j y • In order to prove (8.25), it is therefore enough to prove that Fix t. Put f(z) = (cot z)(cot f) and v = (n+ ±)tt with n = 0,1,2,.... The two factors comprising / are together singular only at z = 0. Thus
4. DIMENSIONS OF SPACES OF MODULAR FORMS 237 z~lf(i/z) has simple poles at z = ±— and ± for k ^ 0, and the respective residues are — cot I — J and — cot(7rikr) for k = 1,2, Also there is a triple pole at z = 0, and the residue of z~l f(i/z) at z = 0 can be seen to be — ^(t + t-1). Let C be the curve in the z plane in the shape of a parallelogram, passing from 1 to r to —1 to —r to 1. Those poles mentioned above that lie within the parallelogram are actually on the diagonals of the parallelogram. Thus the Residue Theorem yields £ £, sdz 2iri, lx 2-2irtx^l/ A /irJb\ . . A £/(,*) T = "^ + 0 + —g- (cot (-] +cot(ir*r)J . (8.27) Let us consider the limiting behavior of f[yz) on each edge of C as n —► +oo. If we parametrize the edge from 1 to r by z = tr + (1 — t) with 0 < t < 1, then we can write i /re2.(n + iM*r-|-(l-t)) + A cot „, = cot(n + |)„ = VeM(n^MtT+(1_0) _ / • (8.M) Let r = p + icr. The exponential here has magnitude c-2(n+*)irta (g 2Q) and tends to 0 pointwise (except at t = 0) as n —> +oo, since <r > 0. Thus cot vz —* — i on this edge. Let us show that the convergence occurs boundedly. Let c > 0 be a number to be specified. When t > r, (8.29) is < <r2T£(7, which can be assumed to be < 1; then (8.28) is bounded. When t < «-, the el9 part of the exponential is e2<(n + $)7r(<H-(l-0) _ ci»[(2n+l)+*(2n+l)(p-l)] _ _ei><(2n+l)(/)-l) The exponent on the right is controlled. For a suitably small €, we can make it < 7r/2 in absolute value, and we see that the denominator of (8.28) is bounded away from 0. Therefore (8.28) is bounded, and the convergence occurs boundedly. Similarly we find that cot ~ is bounded on this edge and tends pointwise to i. By dominated convergence lim / fli/z) — = / — = logr. n-°°Jl Z Jx Z
238 VIII: MODULAR FORMS FOR 5L(2, Z) Similar computations give r1 „ x dz r1 dz . 1 km / f(vz) — = - / — = -*"* + log r n-*°°JT z Jr z hm / /(^) — = / — = log(-r) + n-t lim I* f{vz)^ = -ll ^ = log(-r). Since logr 4- log(—r) = 2 log (y), passage to the limit in (8.27) gives 41og (I) = -?£(t + r"1) + 4i£) I (cot(^r) + cot (^)) 2iri. _j c^v 1 / 1 1 \ This proves (8.26), and we have seen that (8.25) follows. 5. L Function of a Cusp Form In this section we shall associate to each cusp form an L function given by a Dirichlet series. We shall see that the L function extends to be entire and satisfies a functional equation. The prototype for this study is Theorem 7.19, where we passed from certain 0 functions to L functions and proved similar results about the L functions. Let / E Sk be a cusp form, and let /(r) = X^nLi cnQn he its q expansion. The L function of / is the Dirichlet series L(s>f) = Y,%- (8-3°) n = l The L function can be obtained from / by applying a Mellin transform. If F : (0,oo) —► C is given, the Mellin transform of F is the function g(s) defined by g{,) = J F(0<* j
5. L FUNCTION OF A CUSP FORM 239 for all values of s for which the integral converges. (For s imaginary, the Mellin transform is the version of the Fourier transform appropriate for the multiplicative group R+, but we shall not use this fact.) For our cusp form /, let us write r = />+ ia and compute the Mellin transform of /(ur), with p fixed at 0. We proceed formally for now, disregarding convergence. We have „(.)=/ f(ia)a'-= £c„ Jo <r Jo £[ = (2ir)-rwf;^ „ 2*no* d<T € <T 1 n n = l = (2»)-T(«)L(«,/). (8.31) We shall show below that the coefficients cn of a cusp form satisfy \cn\ < Cnkl2. Then the Dirichlet series (8.30) converges absolutely for Re s > ^ + 1, and L(s, f) is analytic in this region. Going over the steps of (8.31), we see that our computation was rigorous in this same region. Lemma 8.10. Let / € 5* have q expansion /(r) = Y^=\cn9n- Then (a) the function <p{r) = \f(r)\<Tk^2 is bounded on H and invariant under 5L(2,Z) (b) |cn|<Cn*/2. PROOF, (a) From/(r) = £~=1 cnqn for \q\ < 1, we have |/(r)| < C\q\ for \q\ < \, i.e., |/(r)| < Ce-2" for a > £log2. (8.32) Consequently <p(r) = \f(r)\<Tk^2 tends to 0 as r tends to oo through the fundamental domain R for SL(2,Z). Since ip is continuous and the part of R with <r < -^ log 2 is compact, y> is bounded on R. On 7i} <p satisfies <p(r + 1) = V?(r) and vH)H/H)lfe)"!=i'i'l/wiw"1)"i = l/(r)kfc/2 = ¥>(r).
240 VIII: MODULAR FORMS FOR SL(2y Z) By Proposition 8.3, <p is invariant under SX(2, Z). Being bounded on R} <p must be bounded on W. (b) We have cn = /_, /(r)e~2T,nr dp, and we have just seen that |/(r)|<C<r-*/2. Hence knl < Ca-We2™* for all a > 0. Taking <r = I, we get |cn| < Ce2wnk/2 as required. Remark. For fixed <r, the function f{p+ i<r) is smooth and periodic in p. Thus its Fourier coefficients decrease faster than any negative power of n. The numbers cn are not the Fourier coefficients of /(/> + tV), however, but are e2xno times those Fourier coefficients. Theorem 8.11 (Hecke). If / € S* is a cusp form for S£(2,Z), then the L function L(s,f)y initially defined for Re s > |» + 1, extends to be entire in s. Moreover, the function A(S,f) = (2ir)-T(s)L(s,f) (8.33) satisfies the functional equation A(s,/) = (-l)fc/2A(* -*,/). (8.34) Proof. The transformation law for f j is /(—1/r) = rk/(r), and we specialize this to r = p -f ia with p = 0 to obtain /(i/cr) = ik*kf(icr). (8.35) From (8.31) we have ,/)= [°° f(i<r)<r'-1 d<r (8.36) Jo A(s for Re 5 > ^ + 1. The argument now proceeds in the same way as for Theorem 7.19c. The formal argument, disregarding convergence, is to make the change of variables a —► a"1 in (8.36) and apply (8.35). To make this argument precise, we use the estimate (8.32) to see that /•OO / f(i<r)as-ldcr
A(. 6. PETERSSON INNER PRODUCT 241 converges for all s € C and defines an entire function. For Re s > | + 1, we rewrite (8.36) as A(SJ) = / f{ia)a9~l da+ f f{ia)a9~l da and transform just the first term, replacing a by a"1 and applying (8.35). The result is ,, /) = ik f°° f(ia)ak'9"1 da + f°° f(ia)a9~l da. (8.37) The first term is of the same form as the second and thus extends to an entire function of s. Hence A(s,/) extends to an entire function of s. Since T(s) is nowhere 0, L(s}f) is entire. Replacing s by k - s in (8.37) gives, upon multiplication by i*, ikA(k -sj) = (-1)* f^ f(ia)a'~l da + ik f°° f{ia)ak—x da. The right side here matches the right side of (8.37) since Sk ^ 0 only when k is even, and (8.34) follows. 6. Petersson Inner Product In this section we shall introduce a Hermitian inner product on the vector space Sk of cusp forms for 5L(2, Z). Lemma 8.12. The measure —r— on H is invariant under 5L(2,R). a2 Proof. In terms of r and f, we have dp Ada = -r{dr A df). So it is enough to prove that ~T_J'" 2 is an invariant 2-form. For g = I , 1, dr Adr . (Im r)2 r\/ —\ the Jacobian determinant \, _x is the determinant of a diagonal 0(r, f) matrix with diagonal entries (cr + d) 2 and (ct -f d)"2. Thus d(gr) A d(gf) = d^ QT} drAdf= \ct + d\~4dr A df. d(r, r) From (8.18) we have lm(gr) = Icr + dl2' and the lemma follows.
242 Vni: MODULAR FORMS FOR S£,(2,Z) For / and h in Sk, we define (/.*>= I f{r)W)°k^£-, (8.38) Jr °* where R is the usual fundamental domain in H for SL(2, Z). By Lemma 8.10, f{r)h(T)<rk is bounded, and therefore the integral is convergent. The result is a well defined Hermitian inner product called the Peters- son inner product on 5*. By Lemma 8.10a, \f(r)\2ak is invariant under SL(2,Z). In view of Lemma 8.12, the measure |/(t)|2<t* —r— on H is invariant under 5L(2,Z). It follows that if we cut R into countably many pieces,* map each piece by a member of SL(2,Z), and reassemble the result as a new fundamental domain R!', then / |/<r)|V^=/|/(r)|V^. Jw * JR * Any measurable fundamental domain for 5L(2,Z) can be obtained in this way, and thus (/, /) is independent of the choice of fundamental domain. By polarization, the same conclusion holds for (/, h). We summarize as follows. Proposition 8.13. On S*, the Hermitian inner product </,A)=//(r)W^£ JR & is independent of the fundamental domain for SL(2,Z). 7. Hecke Operators Hecke operators are a certain kind of linear operator from M* to M*, with St mapping to St • We shall see that these operators commute and are self adjoint relative to the Petersson inner product. Consequently Sjb has an orthogonal basis of simultaneous eigenvectors for the Hecke operators. The end result in §8 will be that the L function of each such eigenvector (normalized to have C\ = 1) has a quadratic Euler product expansion. "Technically we should first remove the part of the boundary of R where the imaginary part is positive, so that R is an exact fundamental domain.
7. HECKE OPERATORS 243 A lattice in C is all integer combinations of a pair of complex numbers linearly independent over R. Recall from §2 that our first examples of modular forms came from homogeneous functions of lattices: G2fc(A), 02(A), 03(A), A(A). Effectively what was shown at the beginning of §2 is the following: Let / be a complex-valued function whose domain is the set of all lattices and which is homogeneous of degree —k in the sense that /(oA) = a"*/(A) for a 6 Cx. (8.39) Define / on the upper half plane H by /(r) = /(AT). (8.40) As in the special cases of §2, / satisfies '((c d)T)=(CT + d)kf(T) for (c j)eSL(2,Z). (8.41) Conversely to any / : 7i —► C satisfying (8.41), we can associate a function / of the lattice variable A, homogeneous of degree — fc, by the definition /(Iwi 0 Z^2) = u>r*/(^2M) if Im(cj2M) > 0. (8.42) Let us check that this function depends only on A, not on the basis. If we have (»7\-(* *\(uA WJ-\c d)\^)' then /(Zw/1©Za;5)=a;/f*/(w2/wI) = (cw2 + duji)~k(c(u2lui) + d)kf(LJ2/ui) = Wlkf(w2/Ul) = f(2u>i © Zu>2), and the independence follows. As a consequence of (8.42) and this independence, /(A) has the homogeneity property /(<*A) = Q~kf(A) for aeCx.
244 VIII: MODULAR FORMS FOR 5L(2,Z) Our initial definition of Hecke operators will be in terms of homogeneous functions of lattices. Then we shall translate the definition into the language of modular forms. Let C be the free abelian group freely generated by the lattices in C. To avoid confusion, we shall write n-A or n(A) for n times the generator in £, and nA or (nA) for the dilate of A by the factor n. The Hecke operator T(n) on lattices, for n = 1,2,3,..., is defined to be the map T(n) : C —► C given by T(n)A= £ A'- (843) [A:A']=n where [A : A'] is the index of A' in A. This is a finite sum, since any such A' satisfies nA C A' C A and so corresponds to a subgroup of A/nAE*Zn©Z„. To define Tk(n) on an element / of the space M* of modular forms of weight ife, we let /(A) be the function of lattices given by (8.42), so that /(A) satisfies (8.39). Then 7*(n)/, as a function of lattices, is given by (Tk(n)f)(A) = nk-1 £ /(A'). (8.44) [A:A']=n It is clear that 7*(n)/ is another function of lattices homogeneous of degree — k. Shortly we shall exhibit the corresponding function of the complex variable r, denoting it by 7*(n)/, and we shall verify that it is a modular form. The resulting operator T*(n) on modular forms is also called a Hecke operator. To compute Tjb(n) explicitly on functions on lattices, let A = Zu>i®Zu>2 with Im u)2/ijJ\ > 0, and let A' = Zuj'x ® Zu^ with Im u>f2/wi > 0 be a sublattice of index n. If we write (3)-(:})(£)■ <8-«> then rj) will be an integral matrix of determinant n. The most general properly ordered basis of A' is obtained by left multiplying the left side of (8.45) by a member of 5L(2,Z), and thus the right coset 5L(2,Z) [a ,) describes all the matrices leading from {w^u/2} to A'. Let M(n) be the set of all integral matrices of determinant n. We have seen that there are only finitely many lattices A' of index n in A. Consequently there are only finitely many right cosets 5L(2, Z)ot in M{n) with detor = n. Thus we have a (disjoint) right coset decomposition M(n)= [j SL{2,l)ai (8.46) 1=1
7. HECKE OPERATORS 245 with i/(n) < oo. To write Tk(n)f as a function of r, we make the definition (/ o [a]k)(r) = /(ar)(cr + <f)-*(det a)*/2 (8.47) •■(: i) for or = ( , J a matrix of positive determinant (so that r 6W implies a€H). For an analytic function /onH, the condition that / be an unrestricted modular form of weight k is that / o [7]* = / for all 7 in SL(2,Z). The formula for Tk(n)f involves this definition for matrices of determinant n. Proposition 8.14. Let {<*,-}jli be a complete set of representatives for the right cosets 5L(2,Z)a of 5L(2,Z) on Af(n). If/ is in Mky then Tk(n)f is given as a function of r by *(n) Tfc(n)/=n^1^/o[ai]ib. (8.48) 1=1 Hence Tk(n)f is an unrestricted modular form of weight k. PROOF. The left side of (8.44) at Ar is just Tk(n)f(r). Let [Ar : A']= n and let or,- = 1 , ] be the coset representative corresponding to A'. This means that «)-(:i)(0 is a basis of A'. Then /(A') = /K(Ze(«4M)Z)) = (cr + «f)-*/(f^) = (detai)-*/2(/o[ai]fc)(r) = n-*/'(/«> WOW. Substituting into (8.44), we obtain (8.48). This proves the analytic- ity of (TJfc(n)/)(r) in 7iy and the transformation law follows from the homogeneity of Tk(n)f as a function of lattices. In order to examine the q expansion of T^(n)/, we shall use an explicit set of coset representatives.
246 VIII: MODULAR FORMS FOR SL(2t Z) Lemma 8.15. The matrices ( n ,1 with ad = n, d > 0, and 0 < 6 < d are a complete set of coset representatives for the right cosets of5L(2,Z)onM(n). Proof. Let ( , „ J be given. Choose relatively prime integers x and y with xa! + yc' = 0, and then choose relatively prime integers u and v with uy + v(—x) = 1. Then ( J is in 5L(2,Z), and /u tA (a' b'\ \x y){c' d') / a'1 bN \ has lower left entry 0. Let us call the resulting matrix 1 ~ ,„ 1. Possibly left multiplying by ( n J, we may assume that a" > 0 and d" > 0. Choose integers q and r with b" — d"q 4- r and 0 < r < d". Then A -q\ (</' b"\_(a" r\ \Q 1 A ° <*"/ V 0 d'V is a coset representative of the kind described in the lemma. Now suppose that two of the elements in the lemma are in the same right coset. Say (u v\ (a b\ _ (a1 b'\ {x y) \0 d)-{0 d!) with uy—vx — 1. The lower left entry forces x = 0. The determinant forces uy = 1. Consideration of signs forces u = y = 1. Finally the inequalities on b and 6' force v = 0. Thus we have a complete set of representatives. Proposition 8.16. Let / in M* have q expansion /(r) = 5Z^=o cnQn- Then Tk(m)f has q expansion Tk(m)f(r) = jr/bnq", n=0 where co<rk-i(m) ifn = 0 bn = { cm if n = 1 £a|GCD(n,m) ak~lcnm/a> if H > 1, with (Tjk^x as in Proposition 8.1. Consequently Tjfc(m) carries M* to M^ and carries S* to 5jk-
7. HECKE OPERATORS 247 REMARK. We shall make serious use of the formula for bn only in the case n = 1. Proof. We apply (8.48), using the representatives in Lemma 8.15. If or = ( n , 1 is such a representative with ad = m, then fo[a}k(r) = f(^)d-kmk^ = f^cne**in(aT+bVdd-krnkf2. n=0 Hence oo 7k(m)/(r) = m*"1 £ £ <r*c„e2*<"(aT+*>/<'> (8.49) n=0a,6,<* with the inner sum over a, 6, d as in the lemma. The sum d-\ 2> 02xinb/d 6=0 equals <f if <f | n and equals 0 otherwise. So we can drop all n's except those of the form n = ld> and then (8.49) is /=0 ad=m /=0 aim <*>0 a>0 The coefficient of qr° comes from / = 0 and is a\m a>0 The coefficient of q1 comes from a = / = 1; it is just cm. For n > 2, the coefficient of qn comes from triples (lya}d) with la = n and a \ m. The factor cim/a is cnm/ai with a | n and a | m. Thus the coefficient of qn is a|GCD(n,m) as asserted. From the formula for 6n, we see that Tk(m)f is a modular form; hence 7]fe(m) carries M* to M*. If c0 = 0, then bo = 0; hence Tk(m) carries St to S^-
248 VIII: MODULAR FORMS FOR 5L(2,Z) Corollary 8.17. Let / in M* have q expansion /(r) = 5Z^L0cn9n- For p prime, Tk(p)f has q expansion Tk(p)f(r) = 5^L0 bnqn with . _ f cPn if P t n \ cpn+pk-lcn/p if p\n. Now we examine how the T*(n) interact with one another. We begin with the operators T(n) on £. Let R(n) : C —► C be given by rt(n)(A) = (nA). It is clear that R(n)T(m) = T(m)R(n) for all n and m. Lemma 8.18. (a) For a prime power pr with r > 1, 7V)T(p) = T^*1) + p • fl^TXp'-1). (8.50) (b) T(m)T(n) = T(mn) if m and n are relatively prime. Proof, (a) Both sides associate to A some sublattices A' of index pr+1, and we have to check that the multiplicity of a sublattice A' is the same for both sides. Fix such a A'. First suppose A' C pA. Then R(p)T(pr-1) = T(pr~1)(pA) contains A' with multiplicity one. Therefore A' occurs on the right side of (8.50) with multiplicity p + 1. On the left side, T(p)A is the sum of the p + 1 lattices strictly between pA and A, and so A' occurs on the left side with multiplicity p + 1. Now suppose A' is not contained in pA. Then it occurs on the right side with multiplicity one, coming only from T(pr'^1). On the left side it occurs with multiplicity at least one. If it occurs with multiplicity greater than one, then there are two distinct sublattices Ai and A2 of A of index p with A' C Ai and A' C A2. Then A' C Ai fl A2 = pA, contradiction. This proves (a). (b) If A' is a sublattice of index mn in A, then A/A' is finite abelian of order mn. If m and n are relatively prime, we can write A/A; = Gi © G2 uniquely with |Gi| = n and |G2| = m. If A" denotes the pullback to A of the members of Gi, then A" is the unique lattice with A' C A" C A and [A : A"] = m. Hence A' occurs on both sides of T(m)T(n)(A) and T(mn)(A) with multiplicity one.
7. HECKE OPERATORS 249 We shall translate Lemma 8.18 into a statement about the Hecke operators on modular forms. The lattices that occur in the definition of T(n)(A) in (8.43) have index n in A, and we shall say that T(n) : C -► C is an operator of degree n. More generally a linear operator N : C —+ C of the form N(A)= £ mA,(A') (8.51) [A:A']=n will be said to be of degree n. In this sense, R(n) has degree n2. If N : C —► C has degree n and is given as in (8.51), we define Nt on functions / on lattices of homogeneity — k by (Nkf)(A) = n*"1 £ mA./(A'). (8-52) [A:A']=n This definition is consistent with (8.44), and it satisfies the properties (S + N)k = Sk + Nk (8.53) if 5 and N both have degree n, and (MAT)* = WbMfc (8.54) if M has degree m and N has degree n (so that Af TV has degree m -h n). Property (8.54) argues for putting AT* on the right side of / in (8.52), but the operators of interest to us will commute, and (8.54) will not be a problem. Notice that our definition (8.52) makes (J2(n)*/)(A) = (n2)*-7(nA) = n*-2/(A). (8.55) Theorem 8.19 (Hecke). On the space Af*, the Hecke operators satisfy (a) For a prime power pr with r > 1, n(pr)Tk(P) = w+i)+p*-1 w1). Hence Tk(pr) is a polynomial in Tk(p) with integer coefficients. (b) Tjb(m)7]b(n) = T*(mn) if m and n are relatively prime. (c) The algebra generated by the Tk(n) for n = 1,2,3,... is generated by the Tk(p) with p prime and is commutative.
250 VIII: MODULAR FORMS FOR SL(2, Z) PROOF, (a) Let / be in Mk. By (8.53), (8.54), and (8.55), Lemma 8.18a gives Tk(p)Tk(p')f = W+1)/+ vPk-2Tk{f-l)f. Hence Tk(p)Tk(p')f = Tk{p'+1)f + pk-1Tk(pr-1)f. Since 7jb(l) is the identity, this equation shows recursively that Tk(pr) is a polynomial in Tlk(p) and hence commutes with T*(p). Then (a) follows. (b) This follows from Lemma 8.18b and (8.54). (c) By (b), the algebra generated by the 7*(n) is the same as that generated by the Tjt(pr). By (a) it is the same as that generated by the Tjfe(p). In turn, these commute by (b). 8. Interaction with Petersson Inner Product We have proved that the Hecke operators on St commute with one another, and we shall next prove that they are self adjoint operators relative to the Petersson inner product. The proof is a little subtle, and we begin with two lemmas. For any integer N > 1, we define the principal congruence subgroup r(7V)ofSL(2,Z) by iw-{(: i)S5lp.d|(: »)-(j •) n^}. The group T(N) is the kernel of the reduction-modulo-TV homomorphism carrying 5L(2, Z) to SX(2, ZN). Since 5L(2, ZN) is finite, T(N) has finite index in 5L(2,Z). Observe that T(l) = 5L(2,Z). For a = (ac b\ in M(n), we let a' = na"1 = ( * ~6V The matrix a' is again in M{n). Lemma 8.20. If a is in M(n), then aT(nm)a-1 C T(m). Consequently T(n2) C aT(n)a-l C 5L(2, Z). (8.56)
8. INTERACTION WITH PETERSSON INNER PRODUCT 251 Proof. Let a = ( a 1 and let 7 be in T(nm). Then -= (: j) u -;) - nm ~V> n) mod nmt and -C J) aryor""1 = ( ) mod m. Taking m = 1 gives the right hand inclusion of (8.56). Taking m = n and replacing or by a' gives the left hand inclusion. Lemma 8.21. If a is in M(n)} then the group T{n)naT(n)a-1 has the same finite index in T(n) as it does in aT(n)a~l. REMARK. The index is finite since both groups lie between T(n2) and SL(2,Z), by Lemma 8.20, and since T(n2) has finite index in SL(2,Z). PROOF. Let R be a fundamental domain in H for the action of T(n) (obtained, for example, as the union of the [5L(2,Z) : T(n)] translates of R by left coset representatives of SL(2,Z)/I\n)). Let R! be a fundamental domain in 7i for the action of T(n) fl orl^n)**""1, obtained as the union of [rfnJiIXnJnalXnJa-1] dpdcr . translates of R by elements of T(n). If // is the measure —— in W, c then /i(#) = [T(n) : r(n) n alXn)*"1]/!^). (8.57) Now a# is a fundamental domain in H for the action of aT(n)a~l. [In fact, if t is given, choose 7 € T(n) with y(a~lr) E #; then (aja~l)r is in or.R. Uniqueness follows similarly.] Moreover, fi(aR) = //(£) (8.58)
252 VIII: MODULAR FORMS FOR 5L(2, Z) by Lemma 8.12 applied to the element n~l^2a of SL(2,R). Next let (otR)' be a fundamental domain in Ji for the action of r(n)nar(n)Qr_1, obtained as the union of \glY{ti)ol-x : T(n) n aT(n)a'1] translates of aR by elements of aT(n)a~l. Then fi((aRY) = (ar(n)a-1 : T(n) n aT(n)a-l]fi(aR). (8.59) Since Rf and (aR)' are fundamental domains for the intersection, the same argument as at the end of §6 shows that fi(R') = fi((aR)f). Comparing (8.57) and (8.59) and using (8.58), we obtain the result of the lemma. Theorem 8.22 (Petersson). The Hecke operators Tjt(n) on the space of cusp forms S* are self adjoint relative to the Petersson inner product. Proof. By Theorem 8.19, it is enough to prove the result for n equal to a prime p. We shall use Lemmas 8.20 and 8.21 and the notation in the proof of Lemma 8.21. Also, as in the proof of Corollary 8.4, we let 6(g, r) = cr+d if g = (abd) has determinant one. If / and h are in 5*, we are to prove that / Tk{p)f(T)W)*k dj£-=i firmm*' d~^- (8-60) jr ° jr * By Proposition 8.13 it is enough to prove that / Tk(P)f(T)W)*k dj£ = /. fWWmo* &£, (8.61) JR °r JR * since each side of (8.61) is the same multiple of the corresponding side of (8.60). Let {a,} be right coset representatives of M(p) relative to 5L(2,Z), as in (8.46). For each choice of or,- as in Lemma 8.15, the element a< = par"1 has ot[ = 7,Qr,7l- for some 7, and 7,- in SL(2,Z). Namely «,= (J J) has «$=("* "o1)*^ J) and -(J!) -■s-(!-,,M-0, i)-
8. INTERACTION WITH PETERSSON INNER PRODUCT 253 Since / o [t,--1]* = / and h o [yfa = h, Lemma 8.12 shows that / f(r)ho[aMry &£ = / . /(r)*o [7.a.*U(r)<r* ^ JR <* Jy[R ff = /./(r)fco[7laj7at(r)<r»^ JR * JR <? In view of Proposition 8.14, proving (8.61) therefore reduces to proving / / • [«]»(rJSfrV ^f = / /(r)ftoM»(V ^ (8.62) JR <? JR ff for each a = c*j. We shall change variables on the left side of (8.62), replacing ar by a new variable r1. Note that or# = n-1^2ar has determinant 1 and satisfies /°H* = f°[a*]k- Thus (8.16) and (8.18) give / o [«]t(r)fcOV = / o [o*]t(r)Af7)(Im t)* = f(a*T)h(a*-1(a#r))6(a*, r)-*(Im r)* = /(a#r)Ma#-'(a*r))^#,r)1 (|f(a*,r)|a)* = /(a#r)/i(a#-»(a#r)) 6(a#-\a*T) -*(Im a#r)* = /(r')Ao[a']t(r')(Im r')*. Taking Lemma 8.12 into account, we obtain / / o [a]k(r)Wy &£ = / . /(r)ftoMt(rK ^ (8.63) JR a JqR <* Now / satisfies f(jr) = 6(y, r)kf(r) for 7 € ar(p)a~1 since arr(p)of_1 C 51.(2, Z) by Lemma 8.20. Also we readily check that /* o [<x']k satisfies h o [a']t(7r) = 6(7, r)*A o [a']t(r) for y 6 ar(p)ar-\
254 VIII: MODULAR FORMS FOR 5L(2,Z) since a/(«r(p)a-1)a/"1 = T(p) C 5L(2,Z). By (8.18), the integrand on the right side of (8.63) is invariant under al^pJaT1. Since aR is a fundamental domain for aT(p)a~l, the right side of (8.63) is unaffected by replacing aR by any other such fundamental domain. Thus we have Jr <* tar(p)a-t:rfr)nar(p)a-x] S(aky ^^^^ ^ (8.64) Meanwhile, on the right side of (8.62), / satisfies f(yr) = 6(y,r)kf(r) for 7 € r(p), and we readily check that h o [a']* satisfies h o [a']*(7r) = 6(7, r)kh o [a']k(r) for 7 e T(p) since a/r(p)or'""1 C 5L(2,Z) by Lemma 8.20. Thus the integrand on the right side of (8.62) is invariant under T(p). Since R is a fundamental domain for T(p), the right side of (8.62) is unaffected by replacing R by any other such fundamental domain. Thus we have / /<r)fco[«'M'V ^ = Jr * The numerical coefficients on the right sides of (8.64) and (8.65) are equal, by Lemma 8.21. Also the common integrand is invariant under T(p) r\ar(p)a~ly and R! and (aR)' are two fundamental domains for this group. Therefore the two integrals are equal. TVacing back, we see that (8.62) is a valid equality, and the theorem follows. We have now shown that the Hecke operators Ti(n) are a commuting family of self adjoint operators on the space S* of cusp forms for 5L(2, Z) of weight k. Consequently Sk has an orthogonal basis of simultaneous eigenvectors.
8. INTERACTION WITH PETERSSON INNER PRODUCT 255 Proposition 8.23. Let / E S* be a simultaneous eigenvector of Tk(n) with Tk(n)f = \(n)f. If / has q expansion f(r) = £~=1 cnqny then cn = A(n)ci. (8.66) Consequently (a) / £ 0 implies c\ ^ 0 (b) the system of eigenvalues {A(n)} determines / up to a scalar. PROOF. In the q expansion of Tjt(n)/, the coefficient of q has to be \(n)c\. But Proposition 8.16 gives it as cn. Thus (8.66) follows. Then (a) and (b) are immediate consequences. Suppose now that / € St is a simultaneous eigenvector of 7i(n). Proposition 8.23 allows us to normalize / so that the q expansion f(r) = Yl™=icnQn nas ci = 1- Then (8.66) says that cn is the eigenvalue of Tfc(n). From Theorem 8.19 we see that CprCp = Cpr+i +pfc_1cpr-i for p prime (8.67a) cmcn = cmn if GCD(m, n) = 1. (8.67b) Let us return to the L function of /, defined in §5 by According to §VII.2, the condition (8.67b) means that L(s,f) has an Euler product. By Corollary 7.8, (8.67a) means that the Euler product is quadractic, the pth factor being 1 l-Cpp-'+p*"1"2'' (Here we are taking dp = — p*""1 in (7.23).) We summarize as follows. Theorem 8.24 (Hecke-Petersson). The space S* of cusp forms has an orthogonal basis of simultaneous eigenvectors under the Hecke operators Tjt(n). Each such eigenvector / can be normalized so that its q expansion f(r) = Y^=zicnQn has C\ = 1. With such a normalization the coefficients cn satisfy (8.67). Moreover, the L function L(s,f) has an Euler product expansion L(i'/)=.Si'->y-V-'- p prime (8.68) convergent for Re s > | + 1.
CHAPTER IX MODULAR FORMS FOR HECKE SUBGROUPS 1. Hecke Subgroups In §VIII.8 we already defined the principal congruence subgroup T(N) of SL(2} Z), for any integer N > 1, by (9.1) This is the kernel of the reduction-modulo-TV homomorphism carrying ST(2,Z) to 5L(2, Z//) and therefore is a normal subgroup of finite index in SL(2, Z). The Hecke subgroups are T0(N) = |Y° J) e SL(2, Z) I c = 0 mod n\ . (9.2) These satisfy r(N)cr0(7V)C5L(2,Z) (9.3) and hence have finite index in SL(2,Z). We can compute the index of To(N) by relating this subgroup to the set M(N) of integer matrices of determinant N. Let M*(N) be the subset of primitive such matrices I a J, those with GCD(a, b,cyd) = 1. Lemma 9.1. With T = 5L(2,Z), M>) = r(o !)r = Urtt (94) disjointly, where or runs over all integer matrices (" J with ad = n, a1 > 0, 0 < 6 < dy and GCD(a,6, d)=l. The number of such right cosets [M'(n):r) = n JJ 0 + ?)• (9-5) p prime p|n 256
1. HECKE SUBGROUPS 257 Proof. If we multiply a nonprimitive member of M(n) on either side by a member of T, the result is still nonprimitive. Taking complements, we see that M*(n) is preserved under these operations. The conclusion that M*(n) = T f"J)r is elementary divisor theory: In the proof that a doubly generated subgroup of a free abelian group of rank 2 is free abelian, one does integer row and column operations on a 2-by-2 integer matrix and arrives at a diagonal matrix. Starting from ( , ,, ], we can do those operations here and arrive at ( n , j with GCD(a,d) = 1. This answer corresponds to having Za 0 Z<j as the quotient of the free abelian groups. Since GCD(a, d) = 1, Za 0 Z<j = Zn, and the theory shows we could have arrived at f n 1. For the conclusion M*(n) = |Ja Tor, we can apply Lemma 8.15 to ( ' A1 ) ^ M*(n) and see that I , ,, 1 is in a unique Ta such that Qr=(n ,},ad~n,d>0, and 0 < 6 < d. Since or must be primitive, we have also GCD(a,6,d) = 1. Hence M*{n) — (Ja Ta in the asserted fashion. For the index formula (9.5), we first show that the index is multiplicative. In fact, let GCD(n,m) = 1. Write M*{n) - \J{ Tai and M*(m) = (J; r# ^ in (9-4)- li is clear that Af *(nm) D \J.j TatiPj. We shall show that equality holds and that the Taify are disjoint. In fact, if 6 is in M*{nm)y then * = 7l(7 J) 72 by (9.4) (n 0\ (m (A = 7lU lJU lj* — 71 f ft - j 73/?j for some 73 £ T and some j = j4Qi/3j for some 74 E T and some i. Thus M*{nm) = \J{j Tottf^ If Ta^j = Ta^Py, then a{Pj = 7a,-//?,-/ and Pj'PT1 G or^Tor,/. From this relation we see that Pj'PT1 has entries in m~*Z and also in n_1Z. Since GCD(m,n) = 1, Pj'Pj1 is in T. Thus /?;- = Pji. Then a,' = 70^/ and or,- = or*'. To complete the proof of the index formula, we may take n = pr with p prime. We can simply count the number of ar's in (9.4). The number
258 IX: MODULAR FORMS FOR HECKE SUBGROUPS IS Pr+(pr--pr-2) + (pr- -Pr_3)+ •+(?-i) + i = Pr + Pr_1. as required. Lemma 9.2. For T = 5L(2,Z), Proof. The left side consists of all members of T of the form fN-1 0\fa b\(N 0\ _ / a bN'1} \ 0 l)\e d)\0 l)~ \cN d J with ( " * ) 6 T. Hence the left side is contained in the right side. If (* *) is in T0(N), then (• ») is in T and fa b\_(N 0\"V a bN\ (N 0\ \c d) ~ \0 \) \cN~l d J \0 \) Proposition 9.3. T — SL(2y Z) acts transitively by right multiplication on T\M*(N) with isotropy subgroup r0(AT) at the coset r f NQ J J. Therefore [r-.r0(N)] = N J] (! + ?)• (9-6) p prime p\N Proof. The action is transitive by (9.4). By Lemma 9.2, we are to show that the isotropy subgroup at r(^j) isTfl (^ j) r ( ^ ? ) * If 7 e T is in the intersection, then y = (^°\ W^Mand Hence the intersection is contained in the isotropy subgroup. In the reverse direction, if 7 is in the isotropy subgroup, then 7 is a member of r with
1. HECKE SUBGROUPS 259 i.e., -1 „/N 0\ (N 0\ r This says ( ^ J ) 7 ( ^ i J is in T. So 7 is in the intersection. It follows that \t\m*{n)\ = |r0(w)\r|, and the index formula (9.6) is then a consequence of (9.5). Example 1. Let N = p be prime. Then Proposition 9.3 gives |r/ro(p)| = p+ 1, and coset representatives for the left coset decomposition T = Uj=o^?^o(p) may be taken as &=(o 1) and # = (-i 0) for0^J'<p- (9.7) To see that these elements represent distinct cosets, we have only to 9p observe that fip xf3j = fij is not in Y0(p) and that is not in I\)(p) unless i = j. EXAMPLE 2. Let N = pr with p prime. One easily checks that coset representatives for the left coset decomposition T = U/^o(p) may be taken as and (A!) with 0</<pr_1 (9.8a) with 0< j <pr. (9.8b) Let iZ be the usual fundamental domain in 7i for 5L(2,Z). (If there is a need to be quite precise, we exclude the part of the boundary where Re r > 0, so that no two distinct points of R are congruent.) Write the left coset decomposition of 5L(2, 2)/Tq(N) as SL(2,l) = \J/3jr0(N)9
260 IX: MODULAR FORMS FOR HECKE SUBGROUPS and put Rn=\J0?1R. Then Rn is a fundamental domain for To(N). [In fact, if r € H is given, we choose (3 G 5L(2, Z) with fir 6 R. If we write /? = /?y7 with 7 € r0(iV), then yr is in yflj-1/? C Rn> So every point of H is congruent to a point of /fyv. Uniqueness is proved similarly.] A picture of this fundamental domain for To(2) appears in Figure 9.1. We use the fy *s of Example 1 above, namely 02 ■0 ?)• *=(-. *)■ *-(-i o) In terms of S = ( _x J ) and T = f J J J, #2 is comprised of R, Pq1R = S(R), fclR = ST(tf). The image under each nontrivial /?_1 of the part of R where Im r is large contributes to a set in the form of a cusp above the real line. Also we regard R itself as contributing a cusp. A feature of this diagram is that S(R) and ST(R) contribute to the same cusp. Thus there are only two cusps, at oo and 0, even though [SL(2,Z) : r0(2)] = 3. -2 -2/3 Figure 9.1. Fundamental domain for r0(2)
2. MODULAR AND CUSP FORMS 261 2. Modular and Cusp Forms It will help now to make systematic use of the notation 6(g,r) = cT + d for(/=(**) inSL(2,R). (9.9) This notation was introduced in the proof of Corollary 8.4. We recall that 6(g, r) satisfies %i02,r) = S(gi)g2T)6(g2ir) 6{g,T)-l=6{g-\gT). As in (8.47), we shall write / o \p]k{r) = «(/?, r)-kf(pT) for 0 G SL(2, R). For a matrix a of positive determinant, we let a# = (detar)_1/2a. Then a# has determinant one, and we define fo[a)k = fo[a*]k. (9.11) This construction is a group operation, in that / o [ona2]* = (/ o [on]*) o [a2]fc. (9.12) An unrestricted modular form of weight k £ Z and level N > 1 is an analytic function /onW with f(jr) = «(7, r)*/(0 for all y e T0(N). (9.13) Since f ~0 _x J is in To (AT), we may as well assume fc is an even integer. Such a function / is a modular form if it is "holomorphic at the cusps," and it is a cusp form if it "vanishes at the cusps." We need to explain these conditions on / at the "cusps." If R is the standard fundamental domain for 5L(2, Z), then we can regard oo as a boundary point of H that is also a limit point of R. Each set 0~lR, with /? € SL(2t Z), is also a fundamental domain for 5L(2, Z). If (J-1 = f ah V then /?_1(oo) = ^ is a boundary point of H and a limit point of fi~lR. Because of the geometry of j3~lR (as in Figure
262 IX: MODULAR FORMS FOR HECKE SUBGROUPS 9.1), we say that ^ is a cusp of/?-1. To eliminate the dependence on 0 in this definition of "cusp," it is customary to define "cusps" by taking QU {00} and identifying those equivalent under the group in question. With SL(2, Z), there is only one equivalence class, hence only one cusp. Now let us pass to To(N). With the above conventions, a cusp is an equivalence class of QU {00} under the action of To(N). If we write the coset space SL(2,Z)/r0(N) as SL(2,Z) = \J^r0(N), (9.14) 3 then each /^(oo) represents a cusp. [In fact, let ^ be in Q with GCD(a,c) = 1. Choose b and d with ad - be = 1, so that 0~l = (° *) is in 5L(2,Z) and has /?_1(oo) = f- Choose 7 € T0(N) and 0j as above with 0 = 0p. Then i~l{0Jl(oo)) = /?-1(oo) = f, so that ^ and /^(oo) represent the same cusp.] There can be duplication among the cusps 0J'l(oo). This is what happens in Figure 9.1. The condition that /J'^oo) and /?;_1(oo) represent the same cusp is that /^(oo) = j0J"l(oo) for some 7 6 To(N). This means that 0n0Jl fixes 00, hence is of the form ± ( l J for some sign and some /. A sufficient condition is that 0i0Jx is of the form if JJ). In To(p) this condition is satisfied by all pairs 0\,... ,0p in (9.7). But the condition is not necessary. In r0(8), which is discussed in Example 2, /?„•=( 9 .1 and 0j = ( I yield the same cusp (take 7 = f q 1 1), but 0i0Jl is not upper triangular. Suppose / is an unrestricted modular form of weight k and level N. Let us examine the behavior of / near a cusp 0~l{oo). We consider the function /o [/?~1]fc(/r), whose behavior near r = 00 (i.e., for Im t large) reflects the behavior of/ near /?_1(oo). Since r'(0 ^€^r(JV)/?nrwcr0(JV), (9.15) we have foyrl]k(T + N) = (fo\fi-*]k)o[(l0 ?)] (r) = V°\rl(l ^)^]*)°[r1]fc(r) by (9.12) = /o[r1]t(r) by (9.13).
2. MODULAR AND CUSP FORMS 263 Arguing as for (8.7), we see that f°[p-%(r)= £ WlN with qN = e""<N. (9.16) n=— oo We say that / is holomorphic at the cusp /^(oo) if <& occurs only for n > 0 and / vanishes at the cusp /^(oo) if cf, occurs only for n> 1. If 0 is replaced by a different coset representative 0y with 7 £ To(Ar), then / ° [(/?7)-1]* = (/ o IT"1]*) o IT']* = / o I/?""1]*, so that the expansion (9.16) is completely unchanged. Therefore our definitions do not depend on the choice of coset representatives. But more is true. Suppose 0fl(oo) and /^(oo) represent the same cusp, i.e., 0aPJl = ± (j [) for some 7 G T0(N) and / E Z. Then we have /°Wr1]*((i!)'-) = (/«»D9r1]t)o[±(j!)]t(r) since* is even =/°i±/r,(;!)wr) = /o[7/^1]*(r) = (/o[7]0«>WS",]*(r) = /o^71]t('-). Thus changing from 0 = /3j to 0 = /?,- makes cjf' in (9.16) get multiplied by e2*xlnlN. The validity of "holomorphic at the cusp" and "vanishes at the cusp" are unaffected. The expansion (9.16) involves qN = e2ntT^N, which is periodic in r of period N. It can happen that a smaller period will work for all /. Namely if /?_1(oo) is a cusp, then {iei\0-l(l[)(Jero(N)} (9.17) is a subgroup of Z that contains TV, according to (9.15). Let h be its positive generator. Going over the above argument shows that /o^-1]*(r + A) = /o|)?-1]t(r). Thus we are led to an expansion in powers of q^ = e2*tTth. We call h the width of the cusp.
264 IX: MODULAR FORMS FOR HECKE SUBGROUPS Examples. (1) The width of the cusp at oo is 1. (2) For r0(p) with p prime, the width of the cusp at 0 is p. (See Figure 9.1.) (3) For r0(4), there are three cusps—the one at oo of width 1 corresponding to one left coset in (9.14), one at 0 of width 4 corresponding to four left cosets, and one at — | of width 1 corresponding to one left coset. Proposition 9.4. Let y9_1(oo) be a cusp of T0(N)} where 0 is in SL(2,Z). Then there exists a unique c*t = f J in M*(N) with ad = N} d > 0, 0 < 6 < d, and GCD(a,6,d) = 1 such that (NQ ;)/r^70| (9.18) for some 7 in SX(2,Z). Moreover, the width of the cusp equals (f/GCD(a,d). Proof. The first conclusion follows from Lemma 9.1. For the second conclusion, we conjugate ( J J ) by (9.18), obtaining Thus *-(;:)'-(? ;)"K:rM(:;)- The left side is in T, and this equality shows it is in C :)"^z'(o;) if and only if al/d is an integer. By Lemma 9.2, it is in To(N) if and only if al/d is an integer. Hence / is in the subgroup (9.17) defining width if and only if al/d is an integer, and it follows that / = d/GCD(a} d).
3. EXAMPLES OP MODULAR FORMS 265 The spaces of modular and cusp forms of weight k and level N will be denoted Mk(To(N)) and Sk(To(N))> respectively. These spaces are 0 for k odd. 3. Examples of Modular Forms There are several constructions of modular forms in one situation from modular forms in another. Typically the new transformation law (9.13) is a consequence of some computation with matrices. Behavior at the cusps is handled by the following result. Proposition 9.5. Suppose that / is analytic in the upper half plane and, for each 0 in SL(2, Z), /ot/?"1]*^) has a holomorphic qtr expansion at oo, where qN = e2*irlN. If a is in M(m)y then (fo[a]k) o [/?_1]*(r) has a holomorphic qmN expansion at oo for each 0 in SL(2,Z). If all of these expansions for / have 0 constant term, then all of these expansions for / o [or]jb have 0 constant term. PROOF. Scaling a so that its entries are relatively prime, we can apply Lemma 9.1 we and rescale to write or/?"1 = 7"1 f J with 7 in SL(2,Z) and (j J J in M(m). Then (fo[a]k)o[^-%(r) = (fo[y-%)o^aQ J)] (r). (9.19) By assumption we can write / o [7_1]jb(*") = Y^=o cne2irtnTfN. Hence the right side of (9.19) is = (m-^d)-kfo[y-%(*i±!L) = (m-l'2d)-k J2 cne2winb^N^e2irina2r^mN\ (9.20) n=Q and the proposition follows. The first two examples give methods for obtaining modular forms of one level from those at a lower level, with the weight remaining the same. Example 1. If / is in Mk(TQ(N))t then / is in Mk(r0(rN)) for r > 1, and similarly for Sk(To(N)) and Sk(To(rN)). The new transformation law (9.13) is valid because a subset of 7's is involved, and the cusp conditions are already built into the properties relative to Tq(N).
266 IX: MODULAR FORMS FOR HECKE SUBGROUPS Example 2. If /(r) is in Mk(T0(N))} then f(rr) is in Mk(r0(rN)) for t > 1. The transformation law (9.13) follows from Lemma 8.20: (/o[(;Oi»)°w*w = ^oi(r.!)Tr(o!)"1woKof)W'-) =/•[(;;)!* w for 7 in To(rN). The conditions at the cusps are satisfied because of Proposition 9.4 with a = ( q ? ) • There is one more simple construction to obtain new modular forms from old ones. This one changes the weight. Example 3. The product of two modular forms for r0(AT), one of weight k and one of weight /, is a modular form of weight k + /. If one is a cusp form, so is the product. Now we give two completely different constructions, largely without proof. Example 4. Define G2{t) = YY'- for ren, (9.21) where the inner sum is taken over m such that (mtn) ^ (0,0). One can prove that this series converges nicely when summed in the order indicated. Moreover, we have G*l(t 4- 1) = G2{t), and one can prove that G2(-l/r) = t2G2(t) - 2irir. Then it follows that G2(^) = (cr + d)2G2{r) - 2«c(cr + d) (9.22) for [acbd) in 5L(2,Z). The argument of Proposition 8.1 shows that G2(r) = 2C(2) - 8tt2 J2 <rx(n)qn. (9.23) n = l G<?\t) = G2(t) - 2G2(2r). (9.24) n = l Put
4. L FUNCTION OF A CUSP FORM 267 Combining (9.22) and the identity (1 0\ (a b\(2 OV1 /a 26 \ 1,0 l) \c d) \0 \) " \c/2 d )' we see that G<2) o [y]2 = G*2) for all 7 G T0(2). Also it is not hard to see from (9.23) that Gr2 ' is holomorphic at the cusps. Therefore G^ *s in M2(r0(2)). Example 5 (Hecke). If p > 3 is prime and tj(t) is as in Corollary 8.9, then (rf{T)/r}(pr))2 is in Mp_i(ro(p)). The difficult thing to prove is the transformation law. To handle the behavior at the cusps, we consider the 12th power, which is Ap(r)/A(pr). Since p is prime, the only cusps are at 0 and oo. For A(r), the q expansion at oo begins with a multiple of q\ the same thing is true for the q expansion at 0 since Ao[[°^)1j]12 = A. For A(pr), the q expansion at oo clearly begins with a multiple of qp. For the q expansion at 0, we follow the steps of the proof of Proposition 9.5, using a = I £ J, /?-i = T-i = («-;), and (*») = (J J). Then (9.20) shows that (A o [a]k) o [P~l]k(r) begins with ql/p. We conclude that Ap(r)/A(pr) has q expnasion at oo holomorphic nonvanishing, while at 0 the first term involves qp~p. Hence (rf'(t)jfn(pT))2 is holomorphic at the cusps. Modifications to it will yield cusp forms. For example, let p — Yin— 1. Then {n{pr) / rf (t))2 has weight — (p — 1), and its q expansions begin with q° at oo and q-JiW- P) at 0. Multiplying by An = A"(*+1> = 772(*+1\ we see that r)(pT)2n(r)2 has weight 2 and has q expansions beginning with qn at oo and with qn-^p~^ = g(p+i)/(i2p) at 0. Hence r)(pr)2v(r)2 is a cusp form of weight 2 when p = — 1 mod 12. 4. L Function of a Cusp Form Let / £ Sk(Fo(N)) be a cusp form, and let /(r) = J2™=i c*?n be *ts q expansion at the cusp oo. The L function of / is the Dirichlet series *(*■/) = ££' (9-25) n — i
268 IX: MODULAR FORMS FOR HECKE SUBGROUPS just as in (8.30), and we have £ f(ia)a' — = (2x)-r(.)Z,(«,/) (9.26) for all values of s for which the integral converges. Lemma 9.6. Let / € Sk{To(N)) have q expansion f(r) = Yl™=i c*tfn at the cusp oo. Then (a) the function <p(r) = \f(r)\<rkl2 is bounded on H and invariant under T0(N) (b)|cn|<Cn*/2. PROOF. Write 5L(2,Z)/r0(N) as in (9.14). First we prove that <p is bounded on R^. The cusps of Rn are the points /?r1(oo), and it is enough to prove boundedness near each, i.e., boundedness of <p(/37 t) for Im r > 2 and all j. Since / vanishes at the cusp /^(oo), we have n=l and hence 1/ o \Pjl]h{r)\ < Cie-2^Im T»N for Im r > 2. (9.27) Thus 1/ o [^"^(^Klm r)*'2 < C\ for Im r > 2. (9.28) Since Im(/?r) = |g(^Tr)|2 for 0 € SL(2, R), (9.29) we have = \f °\PjlWT)6{f5-\Tf\ (Im r)*/2 |^r\T)|-* = l/o^-1]*(r)|(Imr)fc/2. (9.30) In combination with (9.28), (9.30) shows that <p is bounded on Rjsf. Replacing fij1 in the calculation (9.30) by a member 7 of Tq(N)} we see that <p is invariant under To(N). Hence (p is bounded on W, and (a) is proved. To see that (a) implies (b), we argue just as in Lemma 8.10, starting from (9.28) with ft; = 1.
4. L FUNCTION OF A CUSP FORM 269 For St = S*(r<)(l)), we obtained a functional equation for L(syf) as Theorem 8.11, using the transformation law for / under I i ~Q ). For To(^) with N > 1, the element (° "M is not in T0(N)y and the functional equation needs an extra hypothesis. This hypothesis will be given in terms of the effect of the matrix f „ n 1. Let aN = (£ -1), so that a* = J= (£ -1). If 7 = (c d)> then ^^N1 = (_M "a J* Therefore ONro(Ar)^1 C To(N). (9.31) For / G Mjb(ro(Ar)), we consider the map wpjf = / o [<*#]*. For 7 G ro(A^), we have (/ o [orat]*) o [7]* = (/ o [<xN7<Xn%) ° [<*"]* = / o [<*n]*, so that / o [ajv]k is an unrestricted modular form of the same weight and level as /. Let us record the fact that ws maps Mk(To(N)) to itself and Sk(To(N)) to itself. This result is a special case of Proposition 9.5. Proposition 9.7. The map wn carries Mk(To(N)) to itself and Sk{T0(N)) to itself. Proof. If/ is in Mk(To(N)), we apply Proposition 9.5 with m = N and or = or//. The proposition says that (wNf) o \fi-l]k(r) = f; cne2'inT'N°. (9.32) But wnJ is an unrestricted modular form for T0(N)t and hence (u>Nf) ° [/^~1]fc(r) is periodic with period N. Thus the coefficients cn in (9.32) are 0 unless N \ni and / is holomorphic at the cusp 0~l(00). Similarly if / is in Sk(Y0{N)), then wNf is in Sk(TQ(N)). Taking into account Proposition 9.7 and the identity (c**)2 = —I, we see that w^ is an involution on Mk{To{N)) anc* on St(To(N)). These spaces therefore each split as the sum of the eigenspaces for eigenvalues +1 and —1. In the case of Sk{To(N)), we denote these eigenspaces by St(T0(N)).
270 IX: MODULAR FORMS FOR HECKE SUBGROUPS Theorem 9.8 (Hecke). Let / 6 Sk(To(N)) be a cusp form in one of the eigenspaces Sl(TQ(N)) ofwN} where e = ±. Then L(s, f) is initially defined for Re s > | -I- 1 and extends to be entire in s. Moreover, the function A(s,/) = Nst7(2ir)-'r(s)L(sJ) (9.33a) satisfies the functional equation A(s, /) = e{-l)k'2A(k - 5, /). (9.33b) Remarks. The proof is similar to that of Theorem 8.11, except that ( l ~~o ) £ets reP'acec' by a*. The integration from 0 to oo is broken at l/y/N instead of 1 because i/y/N is the fixed point of w^ on the imaginary axis. Proof. The initial region of convergence for L(s, /) is Re s > ^ + 1, by Lemma 9.6b. Since w^f = ef, the inversion law for / under the action of wpj is f^)=sNk^2ikakf(ia). (9.34) By (9.26) we have A(sJ) = N*l2 r f{ia)as~l da. (9.35) Jo From (9.27) we see that / f{i<r)<T*-1 da (9.36) Jl/y/N converges for all s 6 C and defines an entire function. We rewrite (9.35) for Re s> \ + 1 as rl/VN i-oo A(*f /) = N9'2 / f{i<r)a9-1 da + N9'2 I f{i<r)a9^ da. Jo Ji/Sn In the first term, we replace a by (Na)~l and then use (9.34). The result is TOO ^OO A(«, /) = eJV*<fc-*>i* / fiiff)^—1 d<r+N'f2 / /(«o>'_1 da. (9.37)
5. DIMENSIONS OF SPACES OF CUSP FORMS 271 In view of (9.36) the first term extends to be entire in s, and hence so does A(s,/) itself. Since T(s) is nowhere 0, L(s,f) is entire. Replacing s by & — s in (9.37) and multiplying by eik = e(—1)*^2, we obtain e(-l)*/2A(*-5,/) = AT*' /°° f(i*)*-1 <Lr + eN&k-*ih f°° /(i*)*"-'-1 da. Jl/y/N Jlfy/N Comparison with (9.37) yields (9.33b). 5. Dimensions of Spaces of Cusp Forms The kind of explicit argument used to prove Theorem 8.6 is not available for Fo(N). But we can get at the finite-dimensionality of Sk(To(N)) rather painlessly, and this finite-dimensionality will be relevant in obtaining simultaneous eigenvectors for the Hecke operators. Theorem 9.9. The vector space Mfc(r0(Ar)) is finite-dimensional. Sketch of proof. (Some of the steps will be treated in more detail in Chapter XL) The first step is to make K = T0(N)\H into a Riemann surface by introducing appropriate local parameters at the fixed points of elements of To(N) of finite order. Then we form a compactification TV by adjoining one point for each cusp and using local parameters built from q in the case of the cusp at oo, or a transformed qn (h being the width) in the case of the other cusps. The result is a compact Riemann surface. Let / be in Mjb(ro(A0)- Then f12 is in Ml2k(To(N))t and so is A*. Hence F = /12/A* is an analytic function on % invariant under To(N). The function F descends to a holomorphic function on It that extends to be meromorphic on TV. The only possible poles are at worst of order k, coming from the zeros of A*. These poles occur at worst at each cusp, and the number of cusps is < /i = [SL(2, Z) : ro(A^)]. Hence the number of poles of F, counting multiplicities, is < kfi. It follows that the number of zeros of F, counting multiplicities, is F'(r)dT < kfi. In fact, the meromorphic differential forma; = —J, is globally well defined on 1Z*, as a consequence of (8.19). Let us triangulate K* so that each singularity of a; is in the interior of one triangle. Let Ai,..., A/ be the 2-simplices in the triangulation. By the Argument Principle, 1 ' f #{zeros of F] - #{poles of F} = — Y* / »>
272 IX: MODULAR FORMS FOR HECKE SUBGROUPS where 3A;* is the boundary of Aj with positive orientation. In the sum on the right, each 1-simplex appears twice, with opposite orientations, and the right side is therefore 0. We conclude that F has < fc/i zeros, counting multiplicities. Now let / and therefore F vary. Select t/i+1 distinct points pi,..., pkfi+i in 71 and consider the linear map Mk(To(N)) —► C*M+1 given by / —► (F(pi),..., F(pjtp+i)). By the above, this map has 0 kernel. Hence dimMk(r0(N))<kii+l. Our interest ultimately will be in S2(T0(N)). With more machinery, one can calculate the dimension of S2(To(N)) exactly. We shall say what is involved without giving the details. The first step is to compute the genus of the compact Riemann surface TV in the proof of Theorem 9.9. This is possible because we have a natural map IV —+(SL(2}Z)\Hy (9.38) as a consequence of the canonical quotient map r0(A0\5L(2,R) —► 5L(2,Z)\5L(2,R). Applying the Riemann-Hurwitz formula to this map gives a formula for the genus of IV. The space S2(ro(Af)) 1S easily seen to be isomorphic to the space of holomorphic differentials on IV, and the latter space can be shown to have dimension equal to the genus of IV. We summarize as follows. Theorem 9.10. The dimension of 52(r0(A0) is ji(N) n(N) k(N) fioo(N) * 12 4 3 2 > where fi(N) = [SL{2, Z)/r0(AO] = N IJ(1 + I) r o if 41 at ^'"irwo+tir)) otherwise ( 0 if 2 IJV or 9 I TV /*oo(tf) = 2>(GCD(cWd)). Here (^jp) and (^p) are Legendre symbols (referring to quadratic residues), and <p is the Euler ip function.
6. HECKE OPERATORS Corollary 9.11. If p > 3 is prime, (n-l dim52(r0(p)) = i n n , n+1 then if p= 12n + 1 ifp= 12n + 5 ifp= 12n + 7 ifp= 12n+ll 273 6. Hecke Operators As with r0(l) = SL(2,Z), Hecke operators for T0(N) are built from lattices. However, additional structure plays a role. Instead of working with lattices A, we shall work with pairs (A,C), where A is a lattice in C and C is a cyclic subgroup of C/A of exact order N. Such a pair we shall call a modular pair. If A is a lattice, we write Pa for the quotient map of C into C/A. To a modular pair (A,C) and a nonzero complex number or, we can associate another modular pair (orA,aC) as follows: The lattice orA is as earlier, and the cyclic subgroup aC is given by aC=PaA(aPXl(C)). (9.39) A complex-valued function / on modular pairs is said to be homogeneous of degree —k if /(aA, aC) = c*-*/(A, C) for a € Cx. (9.40) To such a function /, we can associate a function / on H by the definition /(r) = /(AT(PA,(iZ)). Let us check that this function / satisfies f(yr) = 6{yt r)kf(r) for y € T0(N). (9.41) In fact, we recall from §VIII.2 that AyT = 6(7, r)_1AT. Thus we have /(77-) = /(A7T)PATr(iZ)) = /(«(7,r)-1Ar,P«(7,r)-A,(^Z)) = *(7,r)*/(AT,*(7,r)P,(7>r)-,Ar(£Z)) by (9.40) = 6(y,r)kf(AT,PAr(6(y,T)p-!fT).1ArPHy>r)-iAr(jrl))) by (9.39), and this is
274 IX: MODULAR FORMS FOR HECKE SUBGROUPS = *(7,r)fc/(Ar,^Ar(*(7,r)(^Z + *(7,r)-1AT))) = 6(7,r)l/(AT)FAr(5(7,r)iZ + Ar)). In turn this is = «(7,r)*/(A„PAr(^Z)) = 6(7,T)kf(r), since easy computation shows that (cr + d)jjl + Z + rZ = £Z + Z + rZ if (" J) is in r0(AT). This proves (9.41). Conversely suppose that / is a complex-valued function on 7it We shall define a function / on modular pairs such that (9.40) holds. Let (A,C) be given. Then A is a sublattice of index N in P^^C) with P^l(C)/X cyclic. It follows that we can find a Z basis {u>i,u>2} of A such that {-^0^1,^2} is a Z basis of P\l(C). We may assume moreover that Im(o;2/wi) > 0. In terms of this basis, we define /(A,C) = wrV(«2M). (9.42) To see that /(A,C) is well defined, let {u^u^} t>e another basis of A such that {j^wj,^} *s a basis of P^X(C) and Im^/^i) ls positive. If we write (:?)=(: *)(:)• (^» then ( a J is in SL(2,Z) since {u>i,u;2} and {u^o^} are both properly ordered bases for A. Since {-^u^,^} and {j^ui,^} are both bases for P^l(C), the equality (ft) -(*.?) (a) <^> forces the coefficient matrix to be integral. Thus TV | c, and 7 = lflM is in ro(A^). By (9.41) we have «i"*/KM) = <*/(7(w2M)) = u>i"~*(c(u/2/wi) + d)kf(u2/ui) = ^rfc/(w2M),
6. HECKE OPERATORS 275 and thus (9.42) is well defined. Finally we check the homogeneity of /. If {u/i,u>2} is a properly ordered basis for A as above, then {au>i,aru2} is a properly ordered basis for orA. Hence f(aA,aC) = (erWl)-fc/((«"2)/(<*Wi)) = a~k }{\,C), and / satisfies (9.40). The two constructions / —► / and / —► / are inverse to one another, and thus we have a one-one correspondence between functions / on modular pairs homogeneous of degree —k and functions / on 7i satisfying (9.41). Let C be the free abelian group freely generated by the modular pairs (A,C). In analogy with (8.43), the Hecke operator T(n) on modular pairs, for n = 1,2,3,..., is defined to be the map T(n) : C —► C given by r(n)(A,C)= £ (A'.C). (9.44) [A:A')=n The notation nC -» C means the following: We have an inclusion nA C A' and an induced map C/(nA) —► C/A'. Under the induced map the cyclic group nC is to map onto the cyclic group C". The sum (9.44) is a finite sum since nA C A' C A and since (A,C) and A' uniquely determine C". It is easy to check that the condition nC -» C" is automatically satisfied if GCD(n, N) = 1. One might expect that nC -» C could be replaced by C" -» C and that an equivalent or perhaps better theory would result. But this expectation is misplaced. For one thing, (A, C) and A' would no longer determine C'\ for another, the analog of Proposition 8.14 would break down in a serious way. We define Tk(n) on the space of functions on modular pairs, homogeneous of degree — k} just as in (8.44): (Tfc(n)/)(A,0 = n*-1 £ /(A'.C1). (9.45) [A:A']=n It is clear that Tk(n)f is another function on modular pairs, homogeneous of degree —k. Shortly we shall exhibit the corresponding operator on functions of the variable t eH, and we shall see that it carries modular forms for To(N) to modular forms for ro(^V), and cusp forms to cusp forms. This operator too will be denoted Tk{n) and is called a Hecke operator.
276 IX: MODULAR FORMS FOR HECKE SUBGROUPS Let us compute the relationship between (A,C) and (A',C") in (9.45) in terms of matrices. Choose a basis {^1,^2} of A such that {^ui,^} is a basis of P^l(C) and Im(w2/wi) > 0, and let {(*>[,u>'2} be a similar sort of basis for A'. Then we can write with f'M in Af(n). The inverse images of nC and C are given by ^»a ("C^ = -£Zwi + nZw2 (947a) N PZ.l(&)=jjM+M- (9-47b) Since Jr*i = ^(oX^a + ^1) - (jf) (<*>* + ^1) = ^(aKi - (£) w2 and nu/2 = (-6n)(cu>2 4- dwi) + (nd)(aw2 4- ta>i) = — (—bnNJa/j 4- (nc/Ju^, c (9.47a) is contained in (9.47b) if and only if — is an integer. Thus nC c maps into C" if and only if — is an integer. Suppose this happens. For nC to map onto C", tt^i must have exact order N relative to A'. The order is the least m > 1 such that nm , , —td! = ru)x 4 slj2 for some integers r and s. Inversion of (9.46) allows us to substitute lj\ = — (—cu4 4 au)[) and rewrite this condition as n 771 —(-cu>2 + au;i) = ru^ 4- s^2. Since TV | c, the condition is just that ma/N = r. The order is therefore JV/GCD(a, AT), and order AT occurs if and only if GCD(a, N) = 1. We conclude that {u^,^} and {ui,^} are related by a matrix in the set M(nyN) = {(«*)g M(n) | c = 0 mod AT and GCD(a,AT) = 1}. The most general similar sort of basis for A' is related to {u/{, cj2} by a member of Yq(N), according to (9.43). Thus the modular pairs (A', C) in the sum (9.44) are parametrized by the right cosets T0(N)\M(n1 N).
6. HECKE OPERATORS 277 Proposition 9.12. Let {a,} be a complete set of representatives for the right cosets r0(N)a of T0(N) on M{n,N). If / is in A/*(r0(AO), then Tk(n)f is given as a function of r by Tfc(n)/=nt-1^/o[a,]t. (9.48) Hence Tk(n)f is an unrestricted modular form of weight jfc and level TV. Proof. The argument is the same as for Proposition 8.14. Corollary 9.13. The Hecke operator Tk(n) carries Mk(To(N)) to itself and Sk(T0{N)) to itself. Proof. If/ is in Mk(To(N))i we apply Proposition 9.5 with m = n, a = ait and 0 € 5L(2,Z) to see that (/ o [a,]*) o [/?*"1]t(r) has a holomorphic qnjv expansion at oo. By (9.48) the same thing is true of (Tjb(n)/) o [/?_1]fc(7")- But Tt(n)f is an unrestricted modular form for r0(AT), and hence (7*(n)/) o[/?_1]*(t) is periodic with period N. Hence the terms in the qnN expansion whose index is not a multiple of n vanish, and Tk(n)f is holomorphic at the cusp /?_1(oo). A similar argument works if / is in Sk(To(N)). To see concretely the effect of Tjt(n) on q expansions, we use explicit coset representatives a,- in (9.48). Arguing as in the proof of Lemma 8.15 and noting that N will have to divide x at the start of the proof, we arrive at the following. Lemma 9.14. The matrices I ,1 with ad = n, d > 0, GCD(a,JV) = 1, and 0 < b < d are a complete set of coset representatives for the right cosets of T0(N) on M(nyN). REMARK. The set of representatives here is a subset of the set in Lemma 8.15. The subset is proper if and only if GCD(n, N) = 1. Proposition 9.15. Let / in Mk(r0(N)) have q expansion /(r) = H2^Locn<In- Then Tk(m)f has q expansion Tk(m)f(r) = f>„g", n=0
278 IX: MODULAR FORMS FOR HECKE SUBGROUPS where CoE a|m,a>0 <* * if M = 0 GCD(a,N)=l bn= { cm if n = 1 Ea|GCD(n,m) <* ~1Cnm/a* if H > 1. GCD(a,JV)xl Proof. The argument is substantially the same as for Proposition 8.16. The condition GCD(a, N) = 1 needs to be carried along throughout. Next we examine how the Tk(n) interact with one another. First we formalize the definition of dilation on modular pairs made in (9.39). We let fl(fi)(AfC) = (nAfnC). This definition is consistent with the one in §VIII.7, and R(n)T(m) = T(m)R(n) for all m and n. The following lemma generalizes Lemma 8.18. Lemma 9.16. (a) For a prime power pr with r > 1 such that p \ Ny T(pr)T(p) = T(pr+1) + p ■ R{p)T{pr~l). (9.49) (b) For a prime power pr with r > 1 such that p \ N, T(pr) = T(Py. (9.50) (c) T(m)T(n) = T(mn) if m and n are relatively prime. Proof. Parts (a) and (c) are proved in the same way as for Lemma 8.18, the cyclic group being of no significance in the argument. For (b), we shall show that T(Pr) = T(pr-l)T(p), (9.51) and then the result follows by induction. If T(p)(A, C) contains a term (A",C") and T(pr-1)(A",C") contains a term (A'.C), then [A:A"]=p, pC-^C", ^".A^p*-1, p'-'C'^C. (9.52) Hence [A : A'] = pr and prC -» pr-lC" -» C. Thus T(pr)(A,C) contains the term (A', C).
6. HECKE OPERATORS 279 In the reverse direction, if T(pr)(A,C) contains (A',C"), it contains it only once. So we want to see that there is only one A" such that (9.52) holds. Fix such a A". Choose a basis {u>i,u;2} of A so that {jjU>itV2} is a basis of P^l(C) and Im^/^i) > 0. Fix a similar basis {w}',^} of A". Changing basis for A" by a member of To(N), we may assume that Wx)~\0 d)\UJ with the matrix as in Lemma 9.14. Since ad = p and GCD(a,TV) = 1 and p | TV, we must have a = 1. Thus Similarly we can find a basis {u^i,^} of A' such that tt)-(j^)tt)-«'<^- Then (:!)=(: 6+/p)te) The uniqueness in Lemma 9.14 shows that 6 -I- b'p here is determined by A and A'. But 6+ b'p determines b and 6'. Hence A" is completely determined by A and A7. This proves (9.51) and the lemma. To get from operators on modular pairs to operators on functions / on modular pairs, we use the definitions of (8.51) and (8.52). Lemma 9.16 then yields the following analog of Theorem 8.19. Theorem 9.17 (Hecke). On the space Mk(Y0(N)), the Hecke operators satisfy (a) For a prime power pr with r > 1 such that p \ TV, Tk(pr)Tk(p) = W+1) + p*"1 W1). Hence 7*(pr) is a polynomial in Tk(p) with integer coefficients. (b) For a prime power pr with r > 1 such that p | TV, (c) Tjt(ra)Tjfc(n) = Tk(mn) if m and n are relatively prime. (d) The algebra generated by the Tjb(n) for n = 1,2,3,... is generated by the 7*(p) with p prime and is commutative.
280 IX: MODULAR FORMS FOR HECKE SUBGROUPS As a consequence of Lemma 9.6, we can define a Petersson inner product on Sk(To(N)). The definition is where Rjsr is a fundamental domain for ro(N). The inner product does not depend on the choice of fundamental domain. In an effort to prove an analog of Theorem 8.22 (that the Tk(n) are self adjoint), we again take n to be a prime p, by Theorem 9.17d. Let Rs be a fundamental domain for T(pN) obtained as the union of translates of R^. Then the argument in the proof of Theorem 8.22 still shows that / /o M*(r)M7) ^? = / /W*o[4(V ^ (9.53) for a e M(p)t just as in (8.62). We now assume that p \ N. For the element or,- = ( 1, choose integei Then ( rs x ,0 and -6^ y so ) = l that p2x - (px — bN I N -Ny :)« = 1 + pbN. {1% xb + y + pb\ Nb + p2 ) -l shows that a{ = 7ia,-7t- for some 7,- and 7,- in Tq(N). Inverting this equation for 6 = 0, we see that we have a similar relation a'{ — 7,or,'7t- for or,- = I JT 1 J. Thus the same argument as the one in Theorem 8.22 that obtained that theorem from (8.62) is applicable here, but only under the assumption p \ N. We summarize as follows. Theorem 9.18 (Petersson). The Hecke operators Tk(n) with GCD(n, N) = 1, on the space of cusp forms Sk(To(N))} are self adjoint relative to the Petersson inner product. It is not always true that the operators Tk(n) with GCD(n,7V) > 1 are self adjoint. Thus Theorem 9.18 does not give us quite as good a result as we had with 5L(2, Z). Since the Hecke operators commute, we can conclude that Sk(To(N)) splits into the orthogonal sum of simultaneous eigenspaces for the operators Tk(n) with GCD(n,7V) = 1. An eigenvector cusp form under the X*(n) with GCD(n,7V) = 1 is called an eigenform; eigenforms in the same such eigenspace are said to be equivalent.
6. HECKE OPERATORS 281 Proposition 9.19. The involution wn of Sk(To(N)) given in §4 is self adjoint and commutes with all Tjk(n) such that GCD(n, N) = 1. Proof. We have ws(f) = /o [<*#]*. The argument of Theorem 8.22 establishes (9.53) for a = a^ if we regard Rn as a fundamental domain for r(7V2). But [a'N] = [<*#], and it follows that wN is self adjoint. To prove that w^Tk(n) = Tfc(n)ti;jv, we may assume that n is a prime p with p f TV, by Theorem 9.17. The matrices or, to use in the formula (9.48) for Tk(p) are (S !)■ (J !)• (i:) -^*sr-i. c«) What is to be checked is that ^2f°[<*N<Xi]k = $^/©[«•«*]*. (9.55) » t The sum of the contributions from the first two matrices in (9.54) to each side of (9.55) is the same since aN(po T) = {1 l)aN and °"(o p)=(s °i)qn- We shall show that the sum of the contributions from the other matrices in (9.54) to each side of (9.55) is also the same. Specifically we shall show that the matrix with 6 contributes to the left side what the matrix with e contributes to the right side, where 1 < e < p — 1 and e = (—Nb)~l mod p. In fact, we can calculate that \N 0 ) \0 p) = \-Nb p-l(l+ebN)J \0 p) \N o)' The first matrix 7 on the right side is in To(N)i and /o [7]* = /. Thus /oh0 p)L=/°[(J pM as asserted. Consequently the decomposition of Sk(To(N)) into spaces of equivalent eigenforms is compatible with the decomposition of 5*(ro(N)) into S+(r0(N)) and S;(T0(N)). The Hecke operators Tk(n) with GCD(n, N) / 1 commute with the other Tjb(n) and hence map the spaces of equivalent eigenforms into themselves. In each such space there will be at least one eigenvector of all the 7i(n), and its eigenvalues will be as in Proposition 9.20 below. We cannot assert anything at this stage about a relationship between wn and the operators Tt(n) for GCD(n, N) ^ 1.
282 IX: MODULAR FORMS FOR HECKE SUBGROUPS Proposition 9.20. Suppose / £ Sk(To(N)) ls an eigenform that is an eigenvector of all 7i(n), say with Tk(n)f = A(n)/. If the q expansion of / at oo is /(r) = ]CnLi CnQn» ^hen cn = A(n)c!. (9.56) Consequently (a) / ^ 0 implies c\ ^ 0 (b) the system of eigenvalues {A(n)} determines / up to a scalar. Proof. This follows from Proposition 9.15 in the same way that Proposition 8.23 follows from Proposition 8.16. Under the assumptions of Proposition 9.20, we can normalize / so that the q expansion /(r) = YlnLi cn9n nas ci = 1. Then (9.56) says that cn is the eigenvalue of Tk(n). From Theorem 9.17 we see that CprCp = Cpr+i + pk~lcpr-i for p prime, p f N (9.57a) cpr =z [cp)r for p prime, p \ N (9.57b) cmcn = cmn if GCD(m, n) = 1. (9.57c) As in §VIII.8 it follows from (9.57) that the L function L(sJ) = — c Yl™=i ~ nas an Euler product expansion. In this case the Euler factor when p { N is quadratic, as before. But when p \ TV, it is first order of the form 1 - cpp~> We summarize part of this discussion as follows. Theorem 9.21 (Hecke-Petersson). The whole space Sk(T0(N)) of cusp forms is the orthogonal sum of the spaces of equivalent eigenforms. Each space of equivalent eigenforms has a member that is an eigenvector for all 7]fc(n). Any eigenform / in Sk(^o(N)) that is an eigenvector for all Tk(n) can be normalized so that its q expansion /(r) = Y^=i cnQn has c\ = 1. With such a normalization the coefficients satisfy (9.57). Moreover, the L function L(s,f) has an Euler product expansion £(*, ./)- n [tt^t] n [- p prime p\N piN Cpp-s+pk-1-23 p prime rt (9.58) convergent for Re s > \ + 1.
7. OLDFORMS AND NEWFORMS 283 7. Oldforms and Newforms Theorem 9.21 is not the same kind of definitive result that we obtained in Chapter VIII. Since we were unable in Theorem 9.21 to relate Wj\[ to the Tk(p) for which p | N, we ended up with no correlation betweeen L functions having Euler products and L functions having functional equations. It turns out that the difficulty is the presence of cusp forms that come trivially from lower levels. These are the cusp forms considered in Examples 1 and 2 in §3. A specific example is the pair of members A(r) and A(2r) of 5i2(r0(2)). We shall see after (9.61) that both are eigenforms with the same eigenvalues under all T\2{n) with n odd, yet they are linearly independent. More generally the first kind of example was /(r) in Sk(r0(N)) when /(r) was given in Sk(r0(N/r)) and r | N. When GCD(n,N) = 1, the formula for Tk(n)f is the same relative to To(N) as relative to T0(N/r). Hence an eigenform for To(N/r) becomes an eigenform for To(N) with the same eigenvalues. The second kind of example was f(rr) in Sk(To(N)) when /(r) was given in Sk(T0(N/r)) and r \ N. Let Ak(r) be the operator 4k(r)/=/o[(j J)] - (9.59) Since f(r) = £~=1 cnq* has Ak(r)f(T) = rk'2f(rT), (9.60) we know from §3 that Ak(r) carries Sk(T0(N/r)) to Sk(r0(N)). We shall see in Lemma 9.23 below that Ak(r)Tk{n) = Tk(n)Ak(r) if GCD(n, N) = 1. (9.61) Consequently if /(r) is an eigenform for T0(N/r)y then f(rr) is an eigenform for To(N) with the same eigenvalues. We can combine the two examples in sequence as follows: If r\r2 \ N and if /(r) is an eigenform for ro(Ar/(r1r2)), then f(r2r) is an eigenform for To(N) with the same eigenvalues. Such an eigenform we call an oldform. The linear span of the oldforms is denoted Sfd(To(N))1 and its orthogonal complement is denoted SjJew(ro(Af)). The eigenforms in S^eYf(T0(N)) are called newforms for T0(N). Since Tk{n) is self adjoint when GCD(n,7V) = 1, 5few(r0(AT)) is spanned by newforms. The key result, whose proof we omit, is the following Multiplicity One Theorem.
284 IX: MODULAR FORMS FOR HECKE SUBGROUPS Theorem 9.22 (Atkin-Lehner). If / £ Sk(T0(N)) is a newform, then its equivalence class is one-dimensional, i.e., consists of the multiples of /• Before deriving some of the consequences for newforms of Theorem 9.22, we supply a proof of (9.61). Lemma 9.23. If GCD(n, N) = 1 and r | N, then Ak(r)Tk(n) = Tk(n)Ak(r) on Sk(r0(N/r)). Proof. Let ( 0 , 1 range over the usual matrices with ad = n, d > 0, GCD(a,JV) = 1, and b in a complete residue system modulo d. (Since GCD(n,;V) = 1, the condition GCD(ayN) = 1 is vacuous.) By Lemma 9.14 and (9.48) «»-t/.[(s ';)(; !)].■ Since br goes through a complete residue system modulo <f, the right side is = Ak(r)Tk(n)f, and the lemma follows. We return to the consequences of Theorem 9.22. The operators ws and Tfc(p) with p | N commute with the 71b(p) having p \ TV, and thus they map each equivalence class into itself. If / is a newform, then the theorem says that C/ is an equivalence class. Consequently / is an eigenvector for w^ and the Tk(n) with p \ N. The first of these conclusions means that L(s, /) has a functional equation, according to Theorem 9.8. The second of these conclusions means that L(s^f) has an Euler product expansion (after / is normalized), according to Theorem 9.21. But there are further operators of this kind if N is composite, and they give some insight into the eigenvalue ±1 of w^. Fix a prime p
7. OLDFORMS AND NEWFORMS 285 dividing N, and let Q = pl be the exact power of p dividing iV. Then we can choose integers ao and 70 with Qao — (N/Q)7o = 1, and the matrix (Qao l\ \Ny0 Q) will have determinant Q. More generally we consider any matrix »«> = (£ «*) (962) of determinant Q. For / in Sjb(r0(AO), let wQf = fo[w(Q)]k. Lemma 9.24. The operator wq carries Sk(Tio(N)) into itself. It is independent of the choice of defining matrix w(Q), and Wq = 1. It commutes with all T*(n) with GCD(n,N) = 1. If p' is another prime dividing N and if Q' is the corresponding Q> then wq and wq* commute. As p varies over all primes dividing N, the product of the various wq's is wn. Proof. Let f a b J be in To(iV). Carrying out the multiplications, we see that w(Q) (* J) w(Q)"x is in Fo(N). The same argument as with wn in §4 then shows that wq carries St(To(N)) into itself. If we have two representatives w(Q) and V)'{Q)> we calculate that wiQWiQ)"1 is an element 7' of T0(N). For / in Sjfc(r0(AO), we therefore have / o [w(Q))k = (/ o [y')k) o [w\Q))k = / o [w'(Q)]k. Thus iuq is independent of the representative. Similarly we calculate that Q~1w(Q)2 is in r0(7V), and it follows that w2Q = 1. For the commutativity with Tjb(n), it is enough to prove commuta- tivity with 7*(p'), where p' is a prime with p' \ N. Using the facts that p' { N and p' ^ p, we shall choose the representative w(Q) in (9.62) to be congruent to f ^ J modulo p'. Namely choose an integer Q\ with QQ\ = 1 mod p'. Let Qo = Q\ + mp', where m is the product of all primes dividing N but not Qx; then QQ0 = 1 mod p1 and GCD(Q0, W) = 1. Since GCD(Q7Q0,Np'2) = Q, we can choose a' and 7' with Q2Q0a' - Np,2y = Q. Then
286 IX: MODULAR FORMS FOR HECKE SUBGROUPS has the required properties. If m(b) denotes I 0 , J, then we can calculate that m(b)w(Q)m(Q0b)-1 = w'(Q) with w'(Q) of the form (9.62). Hence £ (fo[m(b)]k)o[w(Q)}k= £ (»Q/)°MQo»)U bmodp' bmodp* = £ (™«/)°M6)]*- 6modp' (9.63a) With the same w'{Q) we have (i !H(S !)"'-"«»• and with another w"{Q) we have Thus (/.[(J J)]>W«U-^/).[(J J)]. (/.[(J ;)]J.M<BU-(^/).[(S ?)], (9.63b) (9.63c) Adding the equations (9.63), we obtain wQTk(p)f = Tk(p)wQf as required. If p' is another prime, we calculate that (w(Q)w(Q'))(w(Q')w(Q))-1 is in To(N). Then it follows that wqwq> = wquvq on 5*(ro(N)). For the product formula for wn, we extend the definition of w(Q) in (9.62) to any divisor Q of N with GCD(Q, 7V/Q) = 1. If Q and <?' are relatively prime such divisors, we find that w(Q)w(Q') is a version of w(QQ'). Hence the product of all w(Q) with Q a prime power is a version of w(N). But I ^ "! J is another version of w(N). Then it follows that wn is the product of all wq with Q a prime power dividing N such that GCD(Q, N/Q) = 1. This completes the proof of the lemma.
7. OLDFORMS AND NEWFORMS 287 If p is prime, let To(r,p) be the subgroup of To(r) given by To(r,p) = | (° J) 6 r0(r) | 6 = 0 mod p} . Lemma 9.25. If p is prime, then a complete set of right coset representatives for To(r,p) in To(r) is (o i) with °-* -v~l ifplr (964a) (J {) with 0<j<p-l, and ^ J)ifpfr, (9.64b) where a and y are integers such that pa — ry = 1. Proof. Let I , J be given in To(r,p). If p { a, we can choose j so that p | (6 — aj), and then (; i) >"- r»<^(; 0- If p | r, then p | a is impossible, and the elements (9.64a) represent all cosets. If p \ r and p | a, then (: J) """ r°^{n !)• Hence the elements (9.64b) represent all cosets in this case. We readily check that the elements in question represent distinct cosets, and the proof is complete. Lemma 9.26. Let / be in Sk{To(N)), and let p be a prime dividing N. Up2 | N, then Tk(p)f is in Sk(T0(N/p)). Up\N and p2 \ N, then Tk{p)f + P*'lwpf is in Sk(TQ(N/p)). Proof. Let y = ( * J j be in TQ{Nfp,p). Then (s?)"t(s?)=U J)
288 IX: MODULAR FORMS FOR HECKE SUBGROUPS is in ro(N). Hence /o (S!)l-/-[US)(!!r]. and /o (! 01 transforms according to To(N/p}p). Let {.ft,-} be a system of right coset representatives for r0(AT/p,p) in YQ(N/p). If 7 is in TQ(N/p), then {^7} is another system of right coset representatives. Hence (9.65) and the left side of (9.65) transforms according to rQ(N/p). If we use the representatives in Lemma 9.25, the desired result now follows. In fact, (p 0VV1 A 1/1 0W1 i\ _ 1 /1 A Vo lj Vo lj-pVo pA° iAp\° p) shows that the contribution to the left side of (9.65) from all f n *? J is just p"^+1Tjt(p)/. If p I N and p2 f TV, then p \ {N/p)y and there is one more coset representative, namely ( PQ \N7/p J. It sati satisfies (p 0\ l f pa l\_ 1/1 0\ / pa l\ 1,0 lj V^T/P iApVO Pj\Ntlp l) and the contribution to the left side of (9.65) is wpf.
7. OLDFORMS AND NEWFORMS 289 Theorem 9.27 (Atkin-Lehner). The space 5Jew(r0(^)) of newforms is the orthogonal sum of one-dimensional equivalence classes of eigen- forms. If / is such an eigenform, then / can be normalized so that its q expansion /(r) = YlnL\ cn9n has c\ = 1. In this case the eigenform / is an eigenvector of all X*(n) for all n, of all wq for primes dividing N, and of wn, and the eigenvalues are as follows: (a) Tk(n)f = cnf for all n m (b) wqf = X(Q)f with X(Q) = ±1, if p | N and Q corresponds to p (c) u,Nf = UpiNHQ)f- Moreover, cp = 0 if p2 \ N, while cp = — p'J""1A(p) if p | N and p2 f AT. Consequently the L function L(s, /) has an Euler product expansion £(«,/)= n p pnme l ,4-1-' n p prime 1 U — cpP~* +p' k-1-23 l + \{p)p>- (9.66) and L(s,/) satisfies a functional equation (9.33) with e — IlpiN ^(Q)- Proof. The first sentence is by Theorem 9.22 and the self adjointness of the Tk{n) with GCD(n,7V) = 1. Since the Tjk(n) commute, any such eigenform must then be an eigenvector of all Tk(n). By Theorem 9.21, c\ / 0; also if / is normalized to make c\ = 1, then (a) holds. Lemma 9.24 shows that / is an eigenvector of all wq. Since Wq = 1, (b) holds. Lemma 9.24 also proves (c). Let p2 | N. By Lemma 9.26, Tk(p)f is an oldform equivalent to /. Since / is orthogonal to 5Jld(r0(7V)), Tk(p)f = 0. By (a), cp = 0. Let p | N but p2 f TV'. Lemma 9.26 and the same argument show that Tk(p)f + P*-lu>Pf = 0. By (a) and (b), cPf = Tk(p)f = -p$-lwpf = -pT-iA(p)/. Hence cp = — p"2A(p). The Euler product expansion follows from Theorem 9.21 and these evaluations of cp when p | N. The functional equation is by Theorem 9.8.
CHAPTER X L FUNCTION OF AN ELLIPTIC CURVE 1. Global Minimal Weierstrass Equations Let E be an elliptic curve over Q. The L function of E is a certain Euler product that takes into account information about the reduction of E modulo each prime p. This section will deal with some preliminaries that make the definition invariant under admissible changes of variables over Q. From the start we may assume that the equation is as in (3.23) with integer coefficients. The discriminant A will then be an integer, and the p-adic norm will satisfy |A|p < 1 with equality if and only if p f A. An equation (3.23) is called minimal for the prime p if the power of p dividing A cannot be decreased by making an admissible change of variables over Q with the property that the new coefficients are p-integral. It is the same to say that |A|P cannot be increased by such a change of variables. The equation (3.23) is called a global minimal Weierstrass equation if it is minimal for all primes and if its coefficients are integers. Before considering existence and uniqueness questions for these notions, it will be helpful to have close at hand detailed formulas for an admissible change of variables. Such a change of variables is given as in (3.43a) by x = u2z' + r and y = u3y' + stiV + *. (10.1) The effect on the coefficients a* of the Weierstrass equation (3.23) and of the related coefficients 6;, c,-, and A is given in Table 10.1. The new coefficients are denoted by primes. 290
1. GLOBAL MINIMAL WEIERSTRASS EQUATIONS 291 ua'j: «2o'2= u3a'3-- u4a'A-. u6a'6-- «26'2= «464= »%= u%-- uV4= «v6= «12A'= =01 + 2s =a2 — sai + 3r — s2 =a3 + rai + It =04 — «03 + 2ra2 — =a6 -f ra4 + r2a2 + =62 + 12r =64 + r62 + 6r2 =66 + 2r64 + r262 4 =68 + 3r66 + 3r264 =c4 =C6 =A (< + r3- -4r3 + r3& rs)c ta3 2 + H + -<2 3r< 3r2-2st — rta\ Table 10.1 Effect of an admissible change of variables Lemma 10.1. Suppose p is a prime and all the coefficients a, in (3.23) are p-integral. If |A|p > p"12 or \c4\p > p-4 or |c6|P > p~6, then the equation is minimal for the prime p. Conversely if p > 3 and |Alp < p~12 and \c^\p < p~4, then the equation is not minimal for the prime p. Remark. The proof will show how constructively to achieve minimality simultaneously for all primes p > 3. Proof. Suppose a change of variables (10.1) leads to a system of p-integral coefficients {aj} with 1 > |A'|P > |A|P. Since u12A' = A, we have |ti|p2|A'|p = |A|P, so that \u\p < 1. Then \u\p < p"1, and |A|p = Hp2|A'|p<p-12.l = p-12. The arguments for c4 and c$ are similar. Conversely let p > 3 and |A|p < p-12 and |c4|p < p"4. Then (3.31) gives 1728A = c\-cl. Since |1728|p = 1, we see that |c6|p < p"6. From §111.2, there is an admissible change of variables leading from (3.23) to y2 = x3 - 27c4x - 54c6
292 X: L FUNCTION OF AN ELLIPTIC CURVE with discriminant A' = 2l2312A. If we make a change of variables (10.1) with u = p and r = s = t — 0, we are led to y2 = x3 - 27(cAp-4)z - 54(c6p-6). This has p-integral coefficients, since |c4p~4|p < 1 and |c6/>~6|p < 1, and the discriminant A" = p~12A' has |A"|P = p12|A'|p = p12|A|p. Hence the given equation was not minimal for the prime p. Proposition 10.2. Fix a prime p and an elliptic curve E over Q. (a) There exists an admissible change of variables for E over Q such that the resulting equation is minimal for the prime p. (b) If E has p-integral coefficients, then the change of variables in (a) has uyr,s,t all p-integral. (c) Two equations that are minimal for the prime p and that come from E are related by an admissible change of variables in which \u\p = 1 and r,s,t are p-integral. Proof, (a) Without loss of generality we may assume E has p- integral coefficients (or actually integral coefficients). Then |A|P < 1. Since the range of | • |p is discrete away from 0, |A|P can be increased only finitely many times if we are to maintain |A|P < 1. Hence in finitely many steps, we can pass to an equation minimal for the prime p. (b) Let E have coefficients {af}, and let the minimal equation have coefficients {<*(•}. Since |A'|p > |A|p, we must have |u|p < 1. From (3.24), we see that all {6,} and {b^} are p-integral. Suppose p ^ 3. If \r\p > 1, then the equation for u6b'8 in Table 10.1 has 3r4 as strictly the largest term in p-norm on the right side, contradiction; we conclude \r\p < 1. If p = 3, we can argue similarly with ueb'e and the term 4r3 to see that |r|p < 1. Similar arguments with u2af2 and — s2, and then with u6a'6 and —t2 give |s|p < 1 and \t\p < 1. (c) We apply (b) to the change of variables relating two minimal equations, finding that |u|p < 1 and that r, s,t are p-integral. Applying (b) to the inverse change of variables, which involves ti"1, we see that KMp <1. Thus |t*|p = 1. Theorem 10.3 (Neron). If E is an elliptic curve over Q, then there exists an admissible change of variables over Q such that the resulting equation is a global minimal Weierstrass equation. Two such resulting global minimal Weierstrass equations are related by an admissible change of variables with u — ±1 and with r, s,t in Z.
1. GLOBAL MINIMAL WEIERSTRASS EQUATIONS 293 PROOF. The uniqueness is immediate from Proposition 10.2c, and we are to prove existence. For existence we may assume that E has integer coefficients a,. For each p dividing A, choose an admissible change of variables {upy rPisp,tp} over Q such that the resulting equation has coefficients af>p and is minimal for the prime p. By Proposition 10.2b, the rationals upirpysp^tp are p-integral. (10-2) If the new discriminant is denoted Ap, then Table 10.1 gives |Up|p2|Ap|p = |A|p. (10.3) Let us write tip = pd'Vp with |vp|p = 1. (10.4a) Define u = Y[pd' (10.4b) We shall make an admissible change of variables {w,r, s,t} in the original equation that leads to an equation with integer coefficients a'{ and discriminant A'. Since u12A' = A, we have |A'|P = |u|;12|A|p = |«p|p-12|A|p = |AP|P (10.5) by (10.3). Thus the new equation is minimal for all p, hence is globally minimal. For each p with p | A, let us write rp = pf>vmp/np with mp and np in Z and with |mp|p = |np|p = 1. Let npl be an inverse to np modulo p6d*. We set up the congruence r = pp'mpnpl mod p6^. (10.6) By the Chinese Remainder Theorem, we can find an integer r such that (10.6) is satisfied for all p with p | A. Then \npr - pppmp\p < p'6d' and \r-rp\p<p-«d> for all p. Similarly we can find integers s and t such that I* - *p\p < P'6dp and \t -tp\p< p~6dp for all p. Our admissible change of variables {u,r, s,t} is now defined, and we are left with showing that the new coefficients {aj} are integers. We
294 X: L FUNCTION OF AN ELLIPTIC CURVE check for all primes p that |ai|p < 1,..., \a'6\p < 1, using the formulas of Table 10.1. For p f A, there is no problem: Since |ti|p = 1 and r, s,t are integers, we have |aj|p < 1. For p | A, we estimate each \a\\p. The estimates are similar, and we illustrate with a2 only. We have u2a2 = <*2 — sa\ + 3r — s = (a2 - spax + 3rp - s£) - (s - sp)ax + 3(r - rp) - (s2 - s2p) = u2pa'2p - (s - sp)ai + 3(r - rp) - (s - sp)(s + sp) W\IW pl"2lp ^maxd^lpla^^, |(s_5p)ai|pt |3(r-rp)|p, |(s - sp)(s + sp)\p) < max{|tij|p, \s - sp\py \r - rp\p} by (10.2) < max{|u2|p, p"6^} < \u2p\p by (10.4a). By (10.4), |t«|p = |up|p. Thus \a2\P < 1, and the proof is complete. The argument in Theorem 10.3 is constructive, provided we know how to produce, for each individual p, an equation that is minimal for the prime p. The proof of Lemma 10.1 shows how to produce such an equation for primes > 3, and an algorithm of Tate, which we do not discuss here, handles the cases p = 2 and p = 3. 2. Zeta Functions and L Functions To define the L function of an elliptic curve E over Q we assume that E is given by a globally minimal Weierstrass equation. This condition is no loss of generality in view of Theorem 10.3. For each prime p, we consider the reduction Ep of E modulo p. This curve was introduced in §V.2 and is defined over Zp. It is singular if and only if p | A. The singular cases were discussed in §111.5. In both the nonsingular and the singular cases we define aP=P+l-#£P(Zp), (10.7) where Ep(2p) is as usual the set of projective solutions. The local L factor for the prime p is the formal power series given by 1 ifpfA b l-apu + pu2 M«) = { / (10.8) if p I A. 1 - apu
2. ZETA FUNCTIONS AND L FUNCTIONS 295 The L function of E is the product of the local L factors, with u replaced in the pth factor by p~s: ^E>=n[ir^]n [,_.,,-'•+,■->•]■ (10-9> An elementary convergence result for this Euler product is given in the next proposition. This result will be improved in the next section. Proposition 10.4. (a) For every prime p, \ap\ < p. (b) For p \ A, the reciprocal roots of 1 — apu + pv? are < p in absolute value. (c) The Euler product defining L(syE) converges for Re s > 2 and is given there by an absolutely convergent Dirichlet series. Proof. The members of Ep(Zp) include oo and cannot consist of more than two other points for each x in Zp. Thus 1 < ##p(Zp) < 2p+ 1 and \ap\ < P- This proves (a). The reciprocal roots are ^(ap ± Jet* — 4p), which is < \ap\ in absolute value; thus (a) implies (b). By Proposition 7.6, (b) implies (c). When p | A, we can calculate ap exactly. According to §111.5, when there is a singularity, there is only one, and it is classified as a cusp, a split case of a node, or a nonsplit case of a node. Proposition 3.11 counted the nonsingular points in #£p(Zp) in each case. Adding one for the singularity, we arrive at the following formula for ap when p \ A: a0 = < 0 for the case of a cusp +1 for a split case of a node (10.10) — 1 for a nonsplit case of a node. Although it is not necessary for the logical development, we shall give some indication of how the local L factors Lp(u) arise. An arithmetically defined L function typically is part of a more naturally defined zeta function, or a variant of such a function. In the case at hand, the zeta function Z(t/, Ep) is a generating function that encodes how many points are on the curve in each finite extension of Zp. If Fpn denotes the field of pn elements, the definition is *(„,*,)=.,pIe*£2>>:
296 X: L FUNCTION OF AN ELLIPTIC CURVE The definition is arranged so that additive formulas for #£,p(Fp») make multiplicative contributions to Z(u, Ep): Operationally one calculates with the formula u±\ogZ(u,Ep) = f^#Ep(Fp.)un. n=l For our elliptic curve, calculation of Z(u,Ep) leads to a combination of three polynomials, two appearing in the denominator and one in the numerator: 1 -apu + pu2 1 — apu (i-u)(i-p«) ifpfA ifp|A. These three polynomials have separate significance, and the product over p of the (reciprocal of) any of them (after substitution of u = p~s) is in principle a meaningful object. However, \\ *; ~ and n ^j \Z7 are just £(s) and C(s — 1) and give no useful information about E. The remaining polynomial leads to L^s^E), which encodes a great deal of information. 3. Hasse's Theorem The goal of this section is to establish the following improvement of Proposition 10.4a. Theorem 10.5 (Hasse). Let E be an elliptic curve over Q with integer coefficients. For each prime pf A, let Ep be the reduction modulo p. Then \p+l-#Ep(lp)\<2Vp. (10.11) Corollary 10.6. The Euler product defining L(s, E) converges for Re s > | and is given there by an absolutely convergent Dirichlet series. Proof of corollary. Let p f A. If ap = p+ 1 - #£P(ZP), the reciprocal roots r of 1 - apu -h p«2 are r = ^(ap ± yJcL* — 4p). By Theorem 10.5, the square root in this expression is imaginary. Hence |r|2 = ±(a% + (4p - a%)) = p and \r\ = y/p. The corollary therefore follows from Proposition 7.6.
3. HASSE'S THEOREM 297 The proof of the theorem that we give is due to Manin. In order to be able to normalize E, we first dispose of the cases p = 2 and p = 3. For these values of p, we have p < 2^/p. In these cases (10.11) therefore follows from Proposition 10.4a. For p > 3, we can make an admissible change of variables that does not affect the condition p \ A, does not change #2?P(ZP), and brings the equation of E into the form y2 = x3 + ax + b. (10.12) We may therefore assume from the outset that E is given by (10.12). We shall work with the nonsingular cubic y2 = X* + aX + > x3 + ax + b y ' defined over the field Zp(x) of rational functions with coefficients in Zp. Two solutions of (10.13) are (X, Y) = (s, 1) and (X, Y) = (x*\ (x3 + ax + 6)*<"-1>). If we specify oo as identity, the projective solutions of (10.13) over Zp(x) form a group, by Theorem 3.8, and we form the group element Zn = (*',(x3 + ax + J)***-1))) + n(x, 1) (10.14) for each integer n, —oo < n < oo. We define a corresponding sequence of integers dn > 0 as follows: If Zn = oo, then dn = 0. Otherwise Zn is of the form (Xn, Yn); in this case we reduce Xn to lowest terms in Zp(x), and we let dn be the larger of the degree of the numerator and the degree of the denominator of Xn. Let us isolate the statements of the two main steps. Lemma 10.7. d_Y - d0 - 1 = #£p(Zp) - p - 1. Proof. By (10.14) with n = 0, dQ = p. Let Np = #£P(ZP) - 1 be the number of affine solutions. We are to prove that d.x = Np + l. (10.15) When the addition formulas of (3.73) are recomputed to take into account the form (10.13), we obtain (x — x*y
298 X: L FUNCTION OF AN ELLIPTIC CURVE Clearing fractions, we see that X-i = (x - xp)2 with deg(fl) < 2p. Therefore the degree of the numerator is 1 greater than the degree of the denominator, and the degree of the denominator is cLi — 1. We shall compute the degree of the denominator after reducing X_i as much as possible. The denominator splits over ~LV as J^[j€Z (x — j)2. The linear factors x — k of the numerator of the fraction in (10.16) are those such that the numerator vanishes at k. In terms of the Legendre symbol (referring to quadratic residues), we have The factor [l+(i3+a.;+6) ^(p-1)] vanishes if and only if [ J + aj j = — 1, and in this case x — j occurs as a factor of the numerator exactly twice. The other factor [j3 + aj + b] vanishes if and only if x — j is a factor of x3 + o.x + 6, and in this case x — j occurs as a factor of the numerator exactly once. The factors that remain in the denominator are (.-*)» when (j3 + apj + b) = +1 and x-j when (f^'J^) = 0- They correspond exactly to the affine Zp solutions of Ep, the first ones giving two values of y and the second ones giving just y = 0. Hence the number of surviving factors in the denominator, which we saw earlier is <f_i - 1, is exactly Np. This equality is (10.15), and the lemma is proved. Lemma 10.8. For —oo < n < oo, (fn_1+dn+1 = 2(fn-h2. (10.17) Proof. First suppose one of Zn_!,Zn,Zn+1 is oo. Then neither of the other two is oo, in view of (10.14). Say Zn = oo. Then dn = 0 and Zn+l = (x, 1), Zn_x = -(x, 1) = (x, -1). Hence dn+i = dn-\ = 1, and (10.17) is verified. Say Zn_i = oo. Then dn_i = 0 and zn - [x, i), zn+1 _ [4{x3+ax+by yn+ij
3. HASSE'S THEOREM 299 by a recomputation of (3.74). Hence dn = 1 and <fn+i = 4, and (10.17) is verified. The remaining case, with Zn+i = oo, is similar. Now suppose that none of Zn-i,ZniZn+i is oo. Write Xn-iyXni Xn+i in lowest terms as The addition formulas allow us to express Xn_x and Xn+\ in terms of Xn as follows: y _ -(<?* + P)(Qx - Pf + (l + yn)2(x3 + ax + 6)03 n-1 ~ Q(Qx - Pf V ' (10.18) r _ -(Qx + P)(Qx - Pf + (1 - y„)2(x3 + ax + 6)^ n+1 C?(Qx - P)2 Addition of the two formulas (10.18) and use of (10.13) give 2(^-1 + -Xn+i) _ -(Qx + P)(Qx - Pf + (x3 + ax + b)Q3 + ((%)3 + q(g) + b)Q3 Q(Qx - Pf PQx2 + P2x + axQ2 + 26Q2 + aPQ (10.19) (Qx - P)2 Multiplication of the two formulas (10.18) and some manipulations give X X _(P*-qQ)2-46Q((?x + P) nn_ An-lA„ + i - _ (10.20) Now also, and the claim is that BD = (Qx-Pf, (10.21) up to a Zp factor. If S is the greatest common divisor of AC and BD, (10.20) gives XC = S[(Px - aQf - AbQ(Qx + P)] (10.22a) BD = S(Qx-Pf, (10.22b)
300 X: L FUNCTION OF AN ELLIPTIC CURVE while the numerator of (10.19) gives AD + BC = 2S[PQx2 + P2x + axQ2 + 2bQ2 + aPQ]. (10.22c) Let F be a prime factor of 5. Then (10.22b) shows that F | BD. Without loss of generality, suppose F \ B. Then F \ A since GCD(A, B) = 1. Since (10.22a) shows that F \ AC, F \ C. Thus F | BC. By (10.22c), F | (AD + BC). So F | i4£>. Since F t ,4, F | D. Then F | C and F I £, in contradiction to GCD(C, D) = 1. We conclude that 5 is a scalar, and (10.21) is proved. The integers in (10.15) are (fn_! =max{degy4,deg#} <fn+1 = max{degC,degD} dn = max{degP,degQ}. We divide the proof of (10.17) into the following cases: (a) dn_! = deg 4 and dn+x = degC (b) dn_i = degB and </n+i = degD (c) rfn_i = deg .4 and dn+i = deg£>, but not Case (a) or (b) (d) dn_i = degB and dn+i = degC, but not Case (a) or (b). Case (a). By (10.21) and (10.22), we have dn-x + dn+x = deg(AC) = deg[(Px - aQ)2 - 4bQ(Qx + P)]. If deg P > degQ, then the unique term on the right of highest degree is P2x2\ so the right side is dn + 2, and (10.17) follows. If degP < degQ, then (10.21) gives degBD = 2 degQ + 2. Then deg(AC) <max{2degP + 2, 2 degQ, 2degQ+ 1, deg P +degQ} <2degQ + l<deg(5£>). But this inequality contradicts the hypothesis of Case (a). Hence degP < degQ is impossible. Case (b). By (10.21), we have dn-i + d„+i = deg(BD) = deg[(Qx - P)2]. If degQ > degP, then the unique term on the right of highest degree is Q2x2; so the right side is dn + 2, and (10.17) follows. If degQ < degP, then (10.22a) gives deg{AC) = deg(P2x2) > deg(P2) > deg(BD).
3. HASSE'S THEOREM 301 But this inequality contradicts the hypothesis of Case (b). Hence degQ < deg P is impossible. Case (c). Since we are not in Case (a) or (b), AegA > degB and degD> degC. Thus deg AD > deg AC, deg AD > deg BD, deg AD > deg EC. (10.23) From the third of these and from (10.22c), we have deg(4£>) = deg(AD + BC) = deg[PQx2 + P2x + axQ2 + 2bQ2 + aPQ]. (10.24) IfdegP>degQ, (10.24) is < deg(PV) = deg(^C), in contradiction to the first inequality of (10.23). If degP < degQ, then (10.24) is < deg(<?2*2) = deg(BD), in contradiction to the second inequality of (10.23). Thus Case (c) is impossible. Case (d). This case is symmetric with Case (c) and similarly is impossible. This completes the proof of the lemma. It is an easy matter to combine the two lemmas to prove the theorem. Induction forwards and backwards by means of Lemma 10.8 gives the formula dn = n2 - (d_! - d0 - l)n + d0. (10.25) (The base cases for the induction are n = 0 and n = — 1, for which (10.25) is trivial.) Substitution from Lemma 10.7 and use of d0 = p gives dn = n2 + apn + p, where ap = p + 1 — #E?{ZP) as in (10.7). The d^s are degrees of polynomials and are therefore > 0. Moreover, two consecutive dn's cannot both be 0. Since ap is an integer, it follows that r2 + apr + p > 0 for all real r. The discriminant of this polynomial must be < 0, and thus \ap\ < 2y/p. This completes the proof of Theorem 10.5.
CHAPTER XI EICHLER-SHIMURA THEORY 1. Overview We return to the theme announced in the preface and discussed a little in §§VII.5 and VIII.l. We saw in Chapter VII that the Riemann zeta function £(s) and the Dirichlet L functions L(s,x), which encode subtle arithmetic information about primes, can be obtained as Mellin transforms of certain 6 functions that have transformation laws akin to those of modular forms. In other words, £(s) and L(s,x) arise as automorphic L functions. Consequently they have analytic continuations and functional equations, and their analytic properties are more manageable. Our objective in this chapter and the next is to address the corresponding problem for the L function L(s, E) of an elliptic curve over Q. We would like L(s,E) to have an analytic continuation and satisfy a functional equation, and the likely way to obtain these conclusions is to identify L(s, E) as an automorphic L function. A special case of a theory of Eichier and Shimura gives a clue to the nature of the automorphic objects to use. Their theory takes certain cusp forms / in S2(ro(N)) and gives a geometric construction of elliptic curves over Q such that L(s,E) = L(a,f). We shall discuss this theory in the present chapter. The Taniyama- Weil Conjecture anticipates that every elliptic curve over Q is obtained in this way from S2(To(N)) if the Eichler-Shimura map is followed by a relatively simple kind of map called an "isogeny." This conjecture will be the subject of Chapter XII. By way of introduction we begin with a construction that passes from members / of S2(To{N)) to homomorphisms from T0(N) into the additive complex numbers. For / in 52(r0(iV)), /(£)d( is invariant under T0(N). Fix t0 in H and define Wr)= fT f(C)dC (11.1) J T0 302
1. OVERVIEW 303 Since / is analytic, F(t) is independent of the path of the integral. For 7 € ro(A0, tne invariance of /(C) d( gives riKT) niT) riKTo) H7(r)) = / /(C) K = / /(C) d( + / /(C) dC 1 /(CK+/ f(C)dC = F(r)+ /(C) dC r0 ^r0 ^r0 (115 (11.2) Put 7(ro) */(t)=/ /(C) dC We claim that $7(7) is independent of To. In fact, if we define F\(r) by (11.1) with ri in place of r0, then (11.2) gives Fi(j(t)) = F1{t) + /7Tl /(C) dC (11.3) Since Fi = F -f J^0 /(C) <*C, comparison of (11.2) and (11.3) gives / /KK=/ /«K, as asserted. Proposition 11.1. For / in 52(r0(iV)), $j is a homomorphism of To(N) into the additive complex numbers. If 70 € To(N) is elliptic (i.e., |Tr To| < 2) or parabolic (i.e., |Tr -yol = 2), then $/(7o) = 0. Proof. The invariance of /(C) ^C under T0(N) gives fYiyiTo rl\T0 r7i72*"o */(Ti72) = / = / + / J T0 J To Jl\To p\T0 p2T0 = + =*/(7i) + */(72), Jto Jtq and $/ is a homomorphism. If 70 € Fo(N) is elliptic, it has finite order and hence so does its image in C. Thus $/(7o) = 0. Suppose 70 = (aebd) is parabolic in To(N). Since |IY 70) = 2 and det7o = 1, we readily find that the equation : = z has a double cz + a
304 XI: EICHLER-SHIMURA THEORY root, namely z = ^. Thus yo(^) = ^. Choose 0 in SI(2,Z) with ^ = /^(oo). Then 71 = PloP"1 has Ti(oo) = /?7o/T V) = /?7o(^) = P(*±) = 00. Thus 71 = ± ( J J J for some / in Z. By (9.17), the integer / is a multiple of the width h of the cusp /J"1 (00) = ^f1. The Fourier expansion (/o[tf-1W(r) = EcBeWBT'* n=l has zero constant term since / is a cusp form, and the analog for Chapter IX of (8.7b) therefore gives r (/o&?-i]2)(c)dc=o for any T\. Hence also r+(/o[^-i]2)(odc=o. (n.4) Now the change of variables (J = j3 X(Q gives */(to)= / hc)<k'= / f(rlQ6tf-lxr2dc J To JflTo JffiTo + l 0rn and this is 0 by (11.4). This completes the proof. Let us consider the relatively simple case that N = 11. According to Theorem 9.10, S2(ro(ll)) is one-dimensional. A nonzero element is /W = »?(r)2»/(llr)2 = f;cn9n (11.5) n=l = q - 2q2 - q* + 2q4 + qs + 2q6 - 2q7 - 2q9 - 2?10 + «» - 2q12 + 4q13 + 4g14 _,i5_V6-2,17 + 4g18 + 2920 + ...,
1. OVERVIEW 305 according to Example 5 in §IX.3. The group ro(ll) turns out to be generated by r-(i!)- "-=(-33 i,)' «-(ii,)- Since T is parabolic, Proposition 11.1 shows that the image of $j is generated by ^/(V^) and $/(Ve). We have */(7)=/ /(O dC J To = ^-f "I c»?n_1 d* = ^(#(«2,r<7(To)) - H(e2*"°)), Z1TI Je2wir0 *~f ZTTl where H(q) = £ <W i n n = l Judicious choice of tq cuts down considerably on the number of terms needed. For r0 = i, computation with 10000 terms gives 11 digits accuracy. For r0 = .125 -f .025* with 600 terms, we obtain H(e2rir°) = .2628106125215867 + .5230455366717368/ H(e2TiV^To)) = .897415264661364 - .9357710802667581i ^(e2WV6(To)) = 1.1532019916801141 + .523045536671738** with 14 digits accuracy. (With 300 terms, w», would have had 10 digits accuracy.) Thus Wl = $f(V4) = -.232177875650357 - .101000467297158i u>2 = *j(V6) = -.202000934594317f. ( ' ' The complex numbers u\ and u>2 are independent over R and therefore define a lattice A in C. For any 7 in To(N)y (11.2) shows that F(j(t)) — F(t) is in A, hence projects to the 0 element of C/A. Thus the map F of (11.1), which initially carries 7i into C (and hence also C/A), descends to a holomorphic map of H = ro(ll)\7i into C/A. In turn, Theorem 6.14 exhibits a biholomorphic map of C/A onto E(C) for an elliptic curve E. If X0(ll) denotes the compactification of H discussed briefly in the proof of Theorem 9.9 and called TV there, then F actually yields a holomorphic map of the compact Riemann surface X0(ll) onto C/A and then onto E(C).
306 XI: EICHLER-SHIMURA THEORY Two miracles occur in this construction. The first miracle is that Xo(ll), E, and the mapping can be defined compatibly over Q. We have not yet defined rational maps and rational projective curves (other than plane curves), and we shall need to make such definitions in order to make our assertion precise. We return to a discussion of rationality shortly. The second miracle is that the L function of E matches the L function of the cusp form / in (11.5). This assertion relies on introducing new interpretations of Hecke operators and their eigenvalues. Let us give a numerical indication of why E can be defined over Q. Substituting from (11.6) into (6.50), we find that 02(A) = 64419.8788704867 - .0000000003i 93(A) = -5699399.99557174 + .00000002i. Then (8.2) and (8.3) give j(A) = -757.672637860052 + .000000000008i. (11.7a) The actual value is j(A) = -^ '35^ = -757.672637860057. (117b) The significance of j(A) for the present context is explained in Proposition 3.7. With the value of j(A) as in (11.7b), the proposition shows that E can be defined over Q. Let us classify the inequivalent Q structures on E and see that there are infinitely many of them. It will turn out that any two of them are equivalent over a quadratic extension of Q. If E ultimately is given in global minimal Weierstrass form, then the integer tuple (c4,C6,A) determines the Q structure of E, and the equation of E is unique up to the kind of admissible change of variables described in Theorem 10.3. Since ; = £1 = 1728-4 we readily find that the only possibilities are c4 = 24 • 31m2 c6 =-23 • 2501m3 (11.8) A= -ll5m6
1. OVERVIEW 307 for an integer m ^ 0. We shall see that there are some restrictions on m. The curve y2 = x3 - 4mx2 - 160m2x - 1264m3 (11.9) has c4 = 24(24 • 31m2), c6 = 26(-23 • 2501m3), A = 212(-ll5m6) and thus must arise from E by an admissible change of variables with u = 2. We claim that m has no odd prime square factor p2. In fact, otherwise we could use u = 1/p in (11.9) and change variables to find that E was not minimal at the prime p. Similarly 16 does not divide m, since otherwise we could use u = 1/4 to find that E was not minimal at the prime 2. If m = 1 mod 4, then (119) is equivalent with the curve y2 + y = x3 - mx2 - 10m2x - ±(79m3 + 1), (11.10a) which has (c4,C6> A) given by (11.8) and is therefore global minimal, by Lemma 10.1. If m = 2 mod 4, then similarly (119) is equivalent with y2 = x3- mx2 - 10m2x - 158(±m)3, (11.10b) which has (c4,c6,A) given by (11.8) and hence is global minimal. If m = 3 mod 4, then we can attempt a reduction at the prime 2 by rewriting (11.9) as (y + ax -f b)2 = (x + c)3 - 4m(x -f c)2 - 160m2(x -f c) - 1264m3 or y2 + 2axy + 2by = x3 + (3c - 4m - a2)x2 + (3c2 - 8mc - 2ab - 160m2)x + (c3 - 4mc2 - 62 - 160m2c - 1264m3). To have a reduction, we need 2J to divide the usual coefficient a; of this equation. From a3 we obtain 4 | b. From a4 we obtain 2 | c. Then a2 gives 2 | a, and a4 gives 4 | c. Hence a6 gives 64 | (62 + 1264m3). Writing b = 46', we see that 4 | (6/24-79m3). But this is impossible since 79m3 = 1 mod 4. Thus (11.9) is already global minimal if m = 3 mod 4.
308 XI: EICHLER-SHIMURA THEORY If 4 | m, write m = 4m' with 4 \ m'. Then we can reduce (11.9) to y2 = x3- 4m'*2 - 160m'2s - 1264m'3, (11.H) and we must have arrived at a global minimal form. Since we have established that 4 \ m', the preceding paragraph shows that m' = 3 mod 4. Putting everything together, we see that we get a Q structure whose global minimal equation satisfies (11.8) exactly when m has no odd square factor and m is congruent modulo 16 to one of 1,2,5,6,9, 10, 12, 13, 14. (11.12) The simplest Q structure comes from m = 1, and the global minimal equation can be taken as E: y2 + y = x3 - x2 - lOx - 20. (11.13) This is the equation of interest. The Q structures corresponding to other values of m are called twists of E. We have obtained a mapping of Xo(ll) onto E(C). There are other elliptic curves that X0(ll) maps onto. Define (3)-(i ;)(£)■ and let A' = 1u\ 0 Zu^. Calculating as with A, we find that j(A') = -372.363636363639 - .000000000004t\ The actual value is 163 j(A') = -— = -372.363636363636. The result is an elliptic curve E1 that can be defined over Q. The simplest Q structure corresponds to the equation E': y2 + y=x3-x2 (11.15) with (c4,C6,A) = (16,-152,-11). The same analysis as with E yields twists for the same mJs as in (11.12), but only (11.15) will be of interest to us.
1. OVERVIEW 309 The inclusion A'CA yields a holomorphic homomorphism E'(C) 2 C/A' —► C/A ~ E(C). (11.16a) Such a map is called an isogeny. (Cf. JVT.4.) This particular isogeny can be defined over Q. Namely if (x,y) satisfies (11.15) and if X and Y are defined by 1 2 1 x-1 (r-1)2 then (X,y) satisfies (11.13). To any isogeny corresponds a dual isogeny in the reverse direction. Namely (11.14) yields /5w2\ _ (b 0\ fu2\ \buij " ^0 h) \uij _/5 -3\/l 3\ /w2\ _ (h -3Ww2\ ~ ^0 1 )\Q 5)\Ul)-\0 1 Awi/' Since C/(5A) = C/A, we obtain a holomorphic homomorphism £?(C) S C/(5A) — C/A' ~ £"(C). (11.16b) Since (11.16a) is defined over Q, so is (11.16b). But we shall not write down explicit formulas. In any event, composition of Xo(ll) —► E(C) with (11.16b) yields a rational map of Xo{H) onto E'. Similarly if we define «)-(!!)(:)■ and let A" = Zu;" 0 Zwj, then the actual value of j is „ (37S376)3 J(A)- jj—-, and we are led to the equation E" : y2 + y = x3 - x2 - 7820* - 263580, (11.18)
310 XI: EICHLER-SHIMURA THEORY as well as various twists. The inclusion A" C A yields an isogeny #"(0) -> E(C)f and this isogeny is defined over Q. The dual isogeny yields by composition a rational map of Xo(N) onto E". Thus A'o(ll) maps rationally onto the three curves E, E\ and E*\ all as a result of the one cusp form / in (11.5). Elliptic curves that are isogenous over Q have the same L function (Theorem 11.67). By the Eichler-Shimura theory the common L function for these three curves is the same as the L function as /. The maps of X0(N) to £?, £", and E" are called modular parametrizations of these curves. These maps are determined by giving their x and y coordinates, which are meromor- phic functions on Xo{N) with some additional property to reflect the rationality. The rationality is important: / gives us holomorphic maps of X0(ll) onto the twists of E, E1, and E"\ but we do not get a match of L functions in these cases. So far, we have discussed in detail XQ{\\) only. The special feature of this case is that Ao(ll) has genus one and that correspondingly S2(ro(H)) has dimension one. For general r0(AT), Proposition 11.1 says that any cusp form / in S2(Yo(N)) yields a homomorphism <bj : Y0(N) —► C that annihilates the elliptic and parabolic elements, but the situation is of interest only when the image of $j is a lattice. It turns out that we do get a lattice when / is a newform and all the eigenvalues of the Hecke operators are integers. Then the resulting elliptic curve E is defined over Q, and so are Xo(N) and the map from Xq(N) to E. Moreover the L functions of / and E match. The approach to these results is a little indirect. The "Jacobian variety" J(Xq(N)) of Xq(N) is a certain complex torus with the universal mapping property that any map of Xq(N) to an elliptic curve must factor through J(Xo(N)). Because of the universal property, the construction of the map from Xq(N) to E amounts to obtaining E as a quotient of J(X0(N)). The construction does not yield an equation for E directly. Instead, such an equation has to be worked out from side information; we shall take up this matter in §XII.3. A certain amount of this chapter is about background material. Sections 2-4 deal with compact Riemann surfaces in general and with Xo(N) in particular. Some of the results that we quote about compact Riemann surfaces will not be used in the form as stated but are helpful as motivation for properties of projective curves later in the chapter. Section 5 applies some of the more elementary properties of compact Riemann surfaces to derive properties of Hecke operators on S2(Yo(N)). We shall see that S2(Yo(N)) has a basis in which the Hecke operators are given by integer matrices. Consequently the eigenvalues of the Hecke operators are algebraic integers. Sections 6-8 deal with general projective curves
2. RIEMANN SURFACE X0(N) 311 and show how Xo(N) can be given a canonical Q structure. Sections 9 and 10 discuss abstract elliptic curves, isogenics, abelian varieties, and the algebraic Jacobian variety. The main results of Eichler-Shimura theory are in §§11-12. 2. Riemann Surface Xo(N) In this section we shall give a more careful construction of the com- pactification X0(N) of Tq(N)\H than we gave in §IX.5. In §§3-4 we shall go on to discuss some generalities about Riemann surfaces. We omit many proofs, giving references in the section of Notes at the end. Let C = QU {oo}, the set whose equivalence classes form the cusps of §IX.2. Put H* = H UC, and topologize H* as follows: A basic open set about a point of H is an open disc wholly within 7<, and a basic open set about the member oo of C is {Im r > r] for each r > 0. If x £ Q is in C, a basic open set about x is of the form £>U{r}, where D is an open disc in H of radius y > 0 and center x + iy. The resulting topology on 7i* is Hausdorff, H is an open subset, and SX(2,Z) acts continuously. Let Xo(N) = To(N)\Hm. It is easy to see that Xq(N) is compact. Moreover, with some effort one can show that Xo(N) is Hausdorff. A key step in the argument is Lemma 11.2 below. For z in 7{*} let Tz — b€To(N)\y(z) = z}. Lemma 11.2. If r is in W, then there exists an open neighborhood VT of r in U with VT n y(VT) = 0 for all 7 in ro(A0 not in TT. The set VT may be taken to be stable under TT. We shall introduce a system of charts {(tf„^) !*€«•} onXo(N) (11.19) that makes the compact Hausdorff space Xo(N) into a Riemann surface. Let 7r : W* —► Xq(N) be the quotient map. If V is open in H*, then *~l(*(V)) = L^ervN) *t(y) is open; thus n is an open map. In defining the charts, there are cases and subcases. Case 1. zq = t is in %. We choose VT as in Lemma 11.2 and let UT = tt(Vt); Ut is open since ir is open. Case la. TT = {±/}. Then -k : VT —* UT is a homeomorphism. We let <pT be its inverse, and then (UTi(pT) is the required chart. Case lb. z0 = r is in W, and FT ^ {±1}. Since TT D {±1} and To(^) Q SL(2, Z), the order of TT is 4 or 6; we write 2n for this number.
312 XI: EICHLER-SHIMURA THEORY Let A be a linear fractional transformation from H onto the unit disc Z — T carrying r to 0, e.g., X(z) = r. Then <pT : UT —> C is well defined Z — T by the formula y>r(7r(2r)) = Hz)n> anc* (Ut,<Pt) is the required chart. Case 2. z0 = x is in C. Choose /? in SX(2,Z) with /?(x) = oo. Then /Jr,/?-1 = {± ( J "jM | m G Z}, where A is the width of the cusp. Let Vx = p-\{\m t > 2}). If t € T0(N) has y{Vx) D Vx ± 0, then /?7/T^{Im r > 2}) R {Im r > 2} ^ 0. Writing fiy/3~l = ( * * J and taking r in the nonempty intersection, we have 2<Im^V) = ^<^<^. Thus c = 0, f$yP~l fixes oo, and y fixes x. In other words, y is in Tx. Put £/x = v(Vx). Then y>r : £/r —* C is well defined by the formula <px(ir(z)) = e^WO/^ and (Ux,<px) is the required chart. Proposition 11.3. The charts in (11.19) are compatible, and Xq(N) becomes a compact Riemann surface. 3. Meromorphic Differentials This section treats meromorphic and holomorphic differentials of compact (connected) Riemann surfaces, with particular attention to Xo(N) and elliptic curves. We omit proofs of results that do not specifically address these applications. Thus let X be a compact Riemann surface, and let {(£/», <Pi)}iei be an atlas. The meromorphic functions on X form a field, which we denote by K(X). A system lj = {<*>i}iei of scalar-valued meromorphic functions LJi on Ui is called a meromorphic differential if Ui o tpT1 = fa o <pTX){<Pj ° ^f1)' °n <Pi(Ui n Uj) C C (11.20a) whenever Ui H Uj / 0. The classical notation that is used for a;,- o ipj1 in order to capture the transformation law is uii((pjl(z))dz, where z is a local parameter. In this notation if w = <pj o ipj (z), then dw = (<Pj°<P7l)'{z)dzy and (11.20a) says that "i{<P7\z))<lz = Uj{<pj\w))dw. (11.20b) The space of meromorphic differentials is denoted il(X). If F is in K(X), then dF = {(f °ft:1)'o^}ig/ is a meromorphic differential. In the development of the theory of compact Riemann surfaces, one of the early steps is to prove the following.
3. MEROMORPHIC DIFFERENTIALS 313 Proposition 11.4. If X is a compact Riemann surface, then the space Q(X) of meromorphic differentials is nonzero. If uj = {u>t}t€/ Is a meromorphic differential, then u; has a meaning in every compatible chart. Namely if (J70fy>o) is a'compatible chart, then we can adjoin u;o : Uq —► C to {u«}«e/ if w0 is defined by <*>o = (wi)({<Pi °<PolY ° <Po) on Un n ^i- A meromorphic differential is a hoiomorphic differential if all the ljj are hoiomorphic functions. The space of hoiomorphic differentials is denoted ^hoi(^)- If w = {ui}i£i is a hoiomorphic differential, then cj is locally of the form df. Namely let (Uo,<po) be a simply connected compatible chart. Then woo^J"1 is analytic on <po(Uo) and is the derivative of an analytic function g. If we define / = g o ip0 on (7o, then (f0(PolY0(Po = ^Oi as asserted. We say that u> locally has an indefinite integral. Let y2 + aixy + a3y = x3 + a2*2 + <*4* + <*6 be the equation of an elliptic curve E over C. In Corollary 6.32 we saw that E(C) is a compact Riemann surface of genus 1. Formally we have 2ydy + aixdy+ axydx + a3dy = (3x2 + 2a2x + a4)dx and therefore dx dy 2y + aix + <i3 3x2 + 2a2x + a4 - a\y' (11.21a) At points of E(C) where — ^ 0, x is a local coordinate of £(C), and ay the left side defines (locally) a hoiomorphic differential. At points where — ^ 0, y is a local coordinate, and the right side defines (locally) ox a hoiomorphic differential. Together these sets of points cover E(C) except for the point at oo. In terms of affine local coordinates (X, 1,W) for projective space, we obtain a similar relation from (3.23a): -dW 3X2 + 2a2XW + aAW2 - a{W = 3a6W2 + 2aAXW + a2X2 - 2a3W - ayX - l' ( ' '
314 XI: EICHLER-SHIMURA THEORY The relationship between the two coordinate systems is [(A",1,W)] = x 1 [(x,y, 1)]. Thus X - - and W = -. In particular, dW = -y-2 dy. y y Calculating, we find that the left side of (11.21b) matches the right side of (11.21a) on the overlap. The common value of (11.21a) and (11.21b) is therefore a (globally defined) holomorphic differential on E(C). It is called the invariant differential for E> for reasons given below. In the theory of compact Riemann surfaces, one of the preliminary results used for the Riemann-Roch Theorem is the following. Proposition 11.5. If X is a compact Riemann surface of genus g, then the vector space fthoi(^) of holomorphic differentials has dimension 9- In the case of E(C), the genus is one, and the invariant differential is the unique holomorphic differential, up to a scalar. We know from Corollary 6.32 that E(C) is biholomorphic with C/A for a lattice A. Then dz on C has a meaning on C/A and defines a holomorphic differential on C/A. By Proposition 11.5, it agrees with the invariant differential in (11.21), up to a scalar. The invariance of dz under translations of C/A, which we know correspond to group translations in E(C), is the reason for the name "invariant differential" for (11.21). Now consider Xo(N). Let n : TV —► Xo(N) be the quotient map. If u> = {^»}»e/ ls a holomorphic differential on Xo(AT), then we define a function f„ : W —► C by Mr) = u>i(w(T)){fPiowy(T) (11.22) if ([/,-, <fi) is a chart about tt(t). Proposition 11.6. The map u —► fw is well defined and is an isomorphism of Qhoi(^o(^)) onto S-2(r0(N)). PROOF. To see that fw is well defined, let (Ui:<pi) and (Uj,(pj) be two charts with tt(t) in Ui PI Uj. Let U — <pi(n(T)) and tj = <Pj(n(r)). Then <pj o n = (<pj o ^r1) o (<pi o w) implies (W ° *)'(*) = (W ° <P7l)\U){<Pi o *)'(r). Hence "Mt)){<Pj °*)'(t) = (^oip-^iti^ipjO^yiU^owyir) = ("i°<P7l)(ti)(<Pi°*)'(r) by (11.20a) = u;,(7r(r))(^o7r)/(r),
3. MEROMORPHIC DIFFERENTIALS 315 and /a, is well defined. It is clear that /^ is analytic. If 7 is in T0(N)y then tt(t) = ir(jr)y so that U(jr) = Ui(n(jT))(<pi o 7tY(jt) = Wi(w(r))(^ o 7T o jY(t)/Y(t) = 6(y,Tfu,i(ir(T))(<pio*)'(T) Hence fw is an unrestricted modular form of weight 2 and level N. For the behavior at the cusps, let x = /?_1(oo) be a cusp, and let (UXi<px) be the chart about x given by (11.19). Here Ux = ir(p-l{lmT>2}) and p,0r(/rlr)) = ea*<T/\ where h is the width of the cusp. Then we have fu,°W-1h(T) = Mp-lT)6(p-\T)-2 = U,x(w{rlT)){<p,OWy(p-lT)6{rl,T)-2 = w.o^, (e ) ff-i/(r) ^ >r) = «,o^1(c2"T/fc)^c2,r*r/fc. (11.23) Since the differential is holomorphic, the first factor is an analytic function of qh = e2*tTfh. The remaining factor is a multiple of g/». Therefore fu is holomorphic at the cusp /?_1(oo) and vanishes there. Clearly the map u —► fw is one-one. To see that it is onto, let / be in 52(r0(AT)) and let x be in W*. Form the chart (Ux,<px). For z in £/r, choose r in 71* with z = ^^(^(t)). Define «.(*) =/0-)((p,o*)'(r))-1 if r is in W. Modular invariance makes u;x well defined. Also u;x is holomorphic everywhere if x is not in ?r(C), and it is holomorphic except at :c if x is in ^r(C). In the latter case, to define u>x at x (the only point where r would be forced to be a cusp), we unravel the argument (11.23) to see that u;r is bounded in a neighborhood of x. By the Riemann removable singularity theorem, ux extends to be holomorphic at x. Then it is easy to see that lj = {wr}r6^« is a holomorphic differential that maps to /.
316 XI: EICHLER-SHIMURA THEORY 4. Properties of Compact Riemann Surfaces This section will summarize, without any proofs, some deeper properties of compact Riemann surfaces. The initial three main theorems about compact Riemann surfaces are the Riemann-Roch Theorem, Abel's Theorem, and the Jacobi Inversion Theorem. All are expressed in the language of divisors. Let X be a compact Riemann surface. We denote by Div(X) the free abelian group on the points of X. An element D of Div(X) is called a divisor. Such an element is written as D=Y^ ordx{D)x with ordx(D) € Z. (11.24) We write DY > D2 if ordx(i?i) > ordx(D2) for all x in X. If / is in K(X)X and x is in X, choose a chart (Ui,<fi) with x, € £/,-, and put ord,(/)=ordVj(,)(/o^-1). (11.25) The right side is > 0 at a zero and < 0 at a pole. The expression (11.25) does not depend on the choice of ({/,-, <pi). The divisor [/] of / 6 K(X)X is given by [/] = 5>rd,(/)*. Such a divisor is called a principal divisor, and the set of principal divisors is denoted Div0(X). The quotient group Pic(X) = Div(X)/Div0(X) is called the divisor class group. By Proposition 11.4, X possesses nonzero meromorphic differentials u). Let [u>] be the divisor of u, defined in the same way as [/]. The divisor class of [u] is independent of the choice of u;. In fact, if u; and u/ are two such, then reference to (11.20a) shows that u; = fu>' with / in K(X)xy and thus the assertion follows. The divisor class of [u;] in Pic(X) is called the canonical class. For a divisor D, the linear system L(D) is the vector space L(D) = {f G K{Xr I [/] + D>0}U{0} = {/ € K(X)* | ordx(/) +ordx(£>) > 0 for all xSX}U {0}. Then L(0) = C since the only holomorphic functions on X are the constants. Also it is clear that D>D' implies L{D)DL{D'). (11.26) The next result limits how much L(D') can fail to be all of L(D).
4. PROPERTIES OF COMPACT RIEMANN SURFACES 317 Proposition 11.7. Let X be a compact Riemann surface, let D' be a divisor, and let x be in X. Then dimL(D' + x) < dimZ,(D') + 1. Consequently L(D) is finite-dimensional for each D in Div(X). The degree deg(D) of the divisor D in (11.24) is £r€X ordx(D). A proof of the following result was sketched in §IX.5. Proposition 11.8. Each principal divisor on a compact Riemann surface has degree 0. Theorem 11.9 (Riemann-Roch Theorem). Let X be a compact Riemann surface of genus g> let D be a divisor, and let W be in the canonical class of Pic(X). Then dim L(D) = deg(Z>) + dim L(W -D)-g+l. Corollary 11.10. Let X be s. compact Riemann surface of genus g, and let W be in the canonical class of Pic(X). Then deg W = 2g — 2 and dimL(W) = g. Corollary 11.11. Let X be a compact Riemann surface of genus g, and let D be a divisor with deg(D) > 2g - 2. Then dimL(D) = deg(D) - g + 1. Corollary 11.12. Let X be a compact Riemann surface of genus g > 1. Then there is no point of X where all holomorphic differentials vanish. Until further notice in this section, we assume that our compact Riemann surface X has genus g > 1. Since X is a closed orientable surface of genus <7, the ordinary homology group H\(X,T) is free abelian on 2g generators. One usually writes a standard homology basis over Z in the form ai,..., agi b\,..., 6^ with special intersection properties. But we shall be content to fix any Z basis cj,... 1C29 of H\(X,T). Proposition 11.13. Let X be a compact Riemann surface of genus g > 1, and let u>i,... ,ug be a basis of fihoi(^) over C. Then the 2g vectors ( \ in Cs are linearly independent over R.
318 XI: EICHLER-SHIMURA THEORY Consequently the vectors in the proposition are a Z basis for a lattice A(X) in C*. The lattice A(X) is unchanged if {c*} is replaced by a different Z basis of Hi{X,l). If {u>j} is replaced by a different Z basis of fthol(^O) the effect is to transform A(X) by a member of GL(g,C). The Jacobian variety of X is the (/-dimensional complex torus J(X) = CVA(X). We define a map * : X — J(X) by fixing a point x0 in X and setting *(«) = { fay}' . (11.27) According to the previous paragraph, if the base point is fixed, the pair (J(X),$) is well defined up to the action of a member of GL(y,C) simultaneously on the Wj's, C5, and A(X). Proposition 11.14. For a compact Riemann surface X of genus > 1, the map $ of X into J(X) is well defined and holomorphic, and its rank (over C) is everywhere 1. The first statement of the proposition has already been addressed, and the second statement follows from Corollary 11.12. In the case of X = Xo(N)y Proposition 11.6 shows that the map $ is essentially the same as the tuple of maps 9jj of §1 if {/;} is a basis of S2(To(N)). This tuple {$/,} will be used in §§10-11 and will be called <l there. Since J(X) is a group, we can extend $ : X —► J(X) to a map $ : Div(X) —► J(X) in the obvious way: If D = Ylxex ordr(D)r, then *(£>)= £ordx(D)<*(x). For general D} this definition depends on the base point xo in (11.27), but it is independent of x0 if D has degree 0. Recall from Proposition 11.8 that all principal divisors have degree 0. Theorem 11.15 (Abel's Theorem). Let X be a compact Riemann surface of genus > 1, and let D be a divisor. Then D is principal if and only if deg(D) = 0 and $(D) is the 0 element of J(X). Corollary 11.16. (a) If X has genus 1, then $ : X —► J(X) is one-one onto and biholo- morphic. (b) If X has genus > 1 then $ : X —► J{X) is a one-one holomorphic mapping onto a proper complex submanifold of J(X).
4. PROPERTIES OF COMPACT RIEMANN SURFACES 319 With g equal to the genus of X, let Div^>(X) = {D € Div(X) | D > 0 and deg(D) = g). Then we can restrict $ from Div(X) to Div^^(X) and obtain a map $ : Div(^)(X) — •/(*). The special case £ = 1 is what was considered in Corollary 11.16a. Theorem 11.17 (Jacobi Inversion Theorem). For a compact Rie- mann surface of genus g > 1, the map $ carries Div^^(X) onto J(X). Corollary 11.18. As a group, J{X) is isomorphic to the group of divisors of degree 0 modulo the subgroup of principal divisors. Theorem 11.19. If F : X —► T is a holomorphic mapping of a compact Riemann surface of genus > 1 into a complex torus, then F factors through the Jacobian variety: F = / o 4> for some holomorphic mapping / : J(X) —► T that is the sum of a translation and a holomorphic homomorphism. This completes our discussion of the three initial main theorems about compact Riemann surfaces. The final part of this section addresses the nature of the function field K(X), indicating part of the connection between Riemann surface theory and the algebraic geometry of curves. Theorem 11.20. Let X be a compact Riemann surface of genus g > 0, and let x be a nonconstant meromorphic function on X (existence by Corollary 11.11). Then there exist a nonconstant y £ K(X) and an irreducible polynomial P in two variables such that P(x,y) = 0 and A'(X) = C(x)[y]/(P(*,y)). We should think of Theorem 11.20 as giving us a mapping of X into P2(C) by z —► (x(z), j/(z)), with the image contained in the plane curve defined by P. In §7, we shall briefly discuss this approach to defining XQ(N) over Q. Theorem 11.21. Let X and Y be compact Riemann surfaces, and suppose F : K(Y) —> K(X) is a nonzero C algebra homomorphism. Then F is one-one and is implemented by a holomorphic map of X onto y.
320 XI: EICHLER-SHIMURA THEORY 5. Hecke Operators on Integral Homology We now return to the subject of modular forms. We shall examine Hecke operators in detail in this section. Hecke operators act on many things, and they do so consistently. Understanding these consistent actions is a key to unlocking the power of the Hecke operators. One such consistent action is on Hi(Xo(N),Z). Using this action, we shall see that S2(TQ(N)) has a basis such that all Hecke operators T2(n) act by integer matrices. One consequence is that their eigenvalues are algebraic integers. This same realization will allow us also to characterize the algebra generated by the Hecke operators T2(n). Before considering Hi(Xo(N)} Z), we consider points and divisors. Recall from §IX.6 that Hecke operators were introduced as acting on the free abelian group generated by modular pairs (A,C), where A is a lattice in C and C is a cyclic subgroup of C/A of exact order N. The set of equivalence classes of such pairs, with equivalence defined essentially by Cx, is really Xq(N) without the cusps, and thus we should expect an action of Hecke operators on the free abelian group generated by the points of X0(N). This is the divisor group. Rather than try to unwind our original definition, however, we shall start afresh. In §IX.6 we defined M(n, N) as the set of 2-by-2 integer matrices of determinant n whose upper left entry is prime to N and whose lower left entry is divisible by N. As in Proposition 9.12, we write M(n,N) as a finite disjoint union K M(n,^) = (Jr0(iV)a,-. (11.28) •=i If t is in 7i*, let [r] be the corresponding member of Xo(N). For such t we define K r(»)(r) = 5>'T1 (11.29) i=l as a member of D\v(X0(N)). If the af-'s are replaced by different coset representatives in (11.28), then it is clear that the right side of (11.29) is unchanged. Let us see that T(h)(t) depends only on [r]. In fact, if 7 is in r0(AT), then M(nyN)y C M(n, N). Hence we can write ctij = 7iaj(0 (11.30)
5. HECKE OPERATORS ON INTEGRAL HOMOLOGY 321 with 7,- G To(iV), and we readily check that i —► j(i) is a permutation of {1,... tK}. (11.31) Thus T(n)(7r) - T(n)(r) = £ [ai7r] - £ far] i i i i (11.32) = £ ([7."i(i)T] - [ai(Orl) t = 0 by (11.31). Equation (11.32) says that (11.29) is a consistent definition ofr(n)[r]. Thus T(n) has been defined as a Z linear map of D\v(X0(N)) to itself. This suggests a natural definition of T(n) on 1-cycles on Xo(N). If we think of a 1-cycle as a sum of loops on Xo(N), we just need a definition of T(n) on a loop. We lift the loop to a path from To to some jto in H*, writing the path as [to,7To], and we try to define K T(n)[TQ,jT0] = £[r0,7<To] (11.33) •=i with ji as in (11.30). Then we hope that this definition descends to homology. But there is much to check—that the definition depends only on the homotopy class of the loop, then that is independent of the lift, and finally that it depends only on the homology class. The motivation for an efficient approach comes from Proposition 11.1. Proposition 11.22. Let Tep be the (normal) subgroup of To(7V) generated by all elliptic and parabolic elements, and let (•)ab refer to a group modulo its commutator. In terms of a specified point tq in 7i*, there is a canonical isomorphism Hi(X0(N), Z) £ ro(AOab/r£. (11.34) Remark. To simplify matters, we shall always assume that r0 is neither an elliptic fixed point nor a cusp.
322 XI: EICHLER-SHIMURA THEORY Proof, We remove from Xo(N) the finite set of images of elliptic fixed points and cusps (parabolic fixed points) under IVAf); the remaining set will be called Xq(N)' , and its preimage in 7t* will be called W. Then e : TV —► Xo(N)' is a covering map. We shall use r0 and e(ro) as base points for fundamental groups, but we drop them from the notation. Let tQ(N) = rQ{N)/{±\}. Then To(N)\H' = XQ(N)' shows that t0(N) is the group of deck transformations of TV over Xo(N)' and that fo(N) ac^s transitively on the fibers. Hence e+(iri(TV)) is normal in *i(Xf>{N)') with quotient f0(AT). Let P : M^oW) —♦ *x(Xo(AO')/«-(*i(«')) S f0(JV) be the quotient homomorphism. Let y0(N) = T0(N)\Hy i.e., Y0(JV) is X0(N)' with the images of the elliptic fixed points restored. We shall use the Van Kampen Theorem to show that *i(Y0(N))*r0(N)/rei where fc is the (normal) subgroup generated by the elliptic elements. Visualize restoring one point to Yq(N) at a time. We apply the Van Kampen Theorem to a small disc about this point and the partially restored Y0(N). Let / be a simple loop about the point within the small disc. The theorem says that the effect on the fundamental group of restoring the point is to convert I into a relation, i.e., to factor by the normal subgroup generated by /. If /i,..., lr are the loops used as all the points are restored, the result is that *x(Yo(N))~7rl(XQ(N)')/(lu...,lr) in obvious notation. Using p, we have a natural homomorphism onto: WX{Y0{N)) S f,(W)/(llr . . , fr) —> f 0(tf )/p(/l, ■ • • ,'r). (H-35) Suppose an element a of ?ri(Xo(A0') maps to the identity coset. Then p(<r) is a product of elliptic elements. Under p, however, the elements /i,... ,/r map to elliptic elements that generate rc. Hence p(cr) = p(<To) for some <7o in (/i,..., /r). Adjusting <r by a^1 and changing notation, we may assume that p(cr) = 1. Then a is in e(ni(H')). But the punctures of TV get filled in during the passage from X0(N)' to Y0(N). Thus a is contractible in Y0(N) and must be in (/i,..., /r). Thus (11.35) is one-one and is an isomorphism.
5. HECKE OPERATORS ON INTEGRAL HOMOLOGY 323 Now we pass from Yq(N) to Xo(N) in the same way, restoring the images of the cusps one at a time. At the start we have *i(YQ(N))~r0(N)/re. Each restoration of a point has the effect of adjoining a new relation (by the Van Kampen Theorem), and the relations are easily seen to be parabolic generators corresponding to each cusp. The end result is that *i(Xo(tf)) = f0(A0/rep - r0(tf)/iv Finally we have HiiXoiN),!) SJ JT1(X0(iV))«b S (r0(JV)/rv)"b. We can readily check that (ri/r2)ab^r?b/rf whenever T2 is normal in T\, and then (11.34) follows. Now we can make T(n) act on Hi(Xo(N),T). Guided by (11.34), we regard H\(Xo(N),Z) as the free abelian group on symbols [7], 7 6 To(N), subject to the rules bv»] = [7i] + N (n36) [7] = 0 if 7 is elliptic or parabolic. This correspondence of symbols [7] and cycles c respects integration. Namely suppose / E ^(^(A^)) corresponds under Proposition 11.6 to u € fthoi(^0 and [7] corresponds to c. Then / f(r)dT= u>. (11.37) «/t0 Jc (Recall from (11.13) that the left side is independent of r0.) To see this, let / be a loop in Xq(N) that maps to c in homology, and lift / to a path /in H*. Then / goes from r0 to some point 7(7*0) in W*. For this 7 and c, the two sides of (11.37) are equal, and 7 does correspond to c since / projects to the loop / mapping onto c in homology. Let the compact Riemann surface Xo{N) have genus g. We know that Hi(Xq(N),Z) is free abelian on 2g generators, and hence so is the isomorphic group of fyj's; in particular, it is torsion free. We can make (11.33) into a rigorous definition by setting K n*)M = !>.•]> (n.38) 1=1 with 7, as in (11.30).
324 XI: EICHLER-SHIMURA THEORY Proposition 11.23. The equation (11.38) makes T(n) into a well defined Z linear operator on Hx(Xq(N), Z), independent of the coset representatives a,- in (11.28). Moreover, if c is a 1-cycle on Xq(N) and u; is a holomorphic differential (corresponding to a member / of 52(r<)(N))), then / w= /t(m)u;, (11.39a) JT(n)c Jc where T(n)u is the holomorphic differential corresponding to T2{n)f. Proof. To see that T(n) is well defined on the group of fyj's, we have to check that T(n)[7J = 0 if 7 is elliptic or parabolic and that n»)[Ti72] = T(n)[Tl] + T(n)[72]; these conclusions are enough, according to (11.36). First we check this identity. Let 71 and 72 be in T0(N)y and write OfTi = 7iQj(i) an<* «jT2 = 7J<**(j)- Then a,(7iT2) = 7*«j(i)72 = (7i-7J(o)**«(0)» so that (11.38) gives T(n)[7l72] = J2 [7<7i(0] = E W + E W(*)l • = 1 » = 1 1=1 K K = £ W + £ Ml = r<")fo] + r(n)[%]. 1=1 j=l Thus !r(rc) is compatible with the first relation of (11.36). Suppose 7 is parabolic with a;7 = 7i<*i(i> Let j^(-) be the £th iterate of the permutation j. Then (*rtK = 7W)7*"1 = 7t7j(i)*y<a>(07*"2 = ■ • • = 7i7j(i) • • • 7j(K-i)(0aj(*>(i) = 7f7i(0---7j(«'-O(,)Qft- Since 7 is parabolic, so is <*»7*orfrl = 7i7j(0 • • -7j(K-»(i)>
5. HECKE OPERATORS ON INTEGRAL HOMOLOGY 325 and the right side shows that this element is in To(N). Since (11.38) gives K • T(n)[y] = T(n)[yK] = £ [a.W1] = 0. Since the group of fyj's is torsion free, T(n)[7] = 0. Suppose 7 is elliptic with yp = 1. It is clear from (11.38) that r(»)[i] = f>] = o. Then PT(n)[1] = T(n)[yP] = T(n)[l] = 0J and T(n)[7] = 0 since the group of fyj's is torsion free. Thus T(n) is well defined on the group of fyj's. The thrust of Proposition 11.22 is that no further relations need to be checked in order to transfer T(n) to a well defined operator on tfi(*o(tf),Z). Suppose oti is changed to a\ — ^a,- with £; in T0(N). From 0^7 = yi<Xj(i)t we obtain a-7 = ei<*a = eaiai{i) = emej^a^y Then (11.36) gives K E k-K*7(y=E n+E fa] - E few) •=1 • t • By (11.38), T(n) gives the same result from the aj's as from the a.-'s. Now let c be a 1-cycle on Xq(N). With ro as base point in W, we can choose a path [70,770] whose projection to Xq(N) is homologous to c. By (11.38), we can write r(»)M = £>] •=1
326 XI: EICHLER-SHIMURA THEORY with 7, as in (11.30). By (11.38), equation (11.39a) is the assertion that £/ f(r)dr= T2(n)f(r)dT. (11.39b) To prove (11.39b), we calculate the right side as r7(ro) / ° [ttttaM dr by Proposition 9.12 7(ro) i Jt0 fCti = £/ f(T)dr + J2 f^dT' i ^or.(ro) , •/<»j(t)(ro) To OTi7(To) f(r')dT' under r'= a;(r) ro) 7.<*>(0(r<>) /(r)(fr (*>) i(.)(To) /■7iOry(0(T0) (11.40) Since j() is a permutation, the sum of paths ^[oi(ro))ai(,)(ro)] i is a 1-cycle on W. Hence the first term on the right side of (11.40) is 0 by the Cauchy Integral Theorem. The second term is equal to the left side of (11.39b) since $/(7i) in §1 is independent of base point. Thus (11.39b) is proved, and the proposition follows. In the setting of Proposition 11.23, let us write (c,u) = Ju> (11.41) for c in //i(Z) = Hi(Xo(N)i2)i so that the proposition says (T(n)c,w) = (c,T(n)u,). We can extend (11.41) by linearity to be defined for c in #i(R) = Hi(Xq(N),R) or even for c in #i(C), with w in Qho\(Xo(N)). Let us observe that ((c,a;) = 0 for somec in 7/i(R) and all cj) => (c = 0). (11.42)
5. HECKE OPERATORS ON INTEGRAL HOMOLOGY 327 In fact, let ci,..., c^g be a Z basis of i/i(R), and write c = yj rjfcCfc with rjt € R. If wi,... yug is a basis of fihol(^o(^)) over C, then we have ^rticttUj) = 0 Jb=l for all j. By Proposition 11.13, all the r* are 0. Thus c = 0. This proves (11.42). We would like to say that the c's and u;'s are in dual vector spaces, with the c's in H\{T) constrained to the integer points. But the complex vector space of c's, namely H\(C)y is 2y-dimensional, while the vector space of u/s, namely £lho\(Xo(N))> is ^-dimensional. So we shall first split H\(C) into two pieces of dimension g. The mapping (•)* : r —► — f of 7i into itself induces an involution of X0(N). Namely if r2 = yn with 7 = ( a bJ in T0(N)> then the element 7* = f °c ~d j of To(iV) has r2* = 7*n*. This involution induces an involution on #i(Z), which we extend by linearity to H\(R) and H\(C). The vector spaces have eigenspace decompositions under * that we write as 1 (11.43) Hi(C) = Ht{C)®Hr(Q. Proposition 11.24. (a) The spaces H* and H± each have dimension g, and the pairing (c,w) exhibits the spaces H*(C) and ^lho\(Xo(N)) as dual to one another. (b) The linear extension of T(n) to #i(C) maps H*(C) into itself, and the restriction of T(n) to H*(C) is the transpose of T^n) on nwo\(X0(N)). (c) The intersection Hi(Z) H H?(R) is a lattice in H?(R), and a Z basis for this intersection is a basis for H*(C). Proof. We write the involution on 1-cycles as c —► c*. If we lift the 1-cycles to paths in 7i* without changing their names and if c is given by [ro, ri], then c* is given by [ro*, t\*]. Also if we realize members lj of
328 XI: EICHLER-SHIMURA THEORY Ohoi(^o(AO) a8 functions in S2(Tq(N)), then (11.37) allows us to write the pairing as (c,f) = Jj(r)dr. The space S2(To(N)) has a conjugate-linear involution / —► /* given by f'(r) = -f(r'). We readily check that /* is again in 52(ro(N)) and that (c*J)=Kn forcinffi(C). If / has / = /*, then if has (if)* = —if. Hence the real subspace 52,+ of/'s in 52(ro(iV*)) with f* — f has real dimension g, and so does the real subspace 52,- with /* = —/. If c is in //^(R), then <c,f) = (c\f)=TcJ*) shows that (c, /) is real for / in 52,+ . If dimR Hf(R) > g} we can therefore find some cq ^ 0 such that (co, /) = 0 for all / 6 52,+ . Since (•,•) is complex bilinear, (co,i/) = 0 also. But if is the most general member of 52,-. So (co,52(r0(AO)) = 0. From (11.42) we see that c0 = 0, contradiction. Thus dimg H*(R) < g. A similar argument with c in //f(R) and / in 52,_ shows that dimH#-(R) < g. By (11.43), equality must hold in both cases. (b) In Proposition 11.23 we choose r0 imaginary, so that if c <-► [ro,7r0], then c* «-> [ro,7*ro]. The formula for T(n) is r<»)M=X>i with <*ij — 7fQr;(j). On matrices the operation (•)* is multiplicative. Thus <*i*7* = 7i*oy(i)*- If we use the standard a's as in Lemma 9.14, then a** and c*j(t)* are different members of the same set. Therefore nn)t7-]=X>-]=w»)Mr. By linearity, T(n)cm = (T(n)c)* on #i(C), and it follows that T(n) maps /^+(C) into itself. By (a) and (11.39a), the action of T(n) on H?(C) is the transpose of the action of T2(n) on Qho\(Xo(N)).
5. HECKE OPERATORS ON INTEGRAL HOMOLOGY 329 (c) If c is in #i(Z), then c=±(c + c*) + i(c - c*) in //j*-(R) e i/f(R). Hence Ai = {±(c + c*) | c G #i(Z)} spans i/+(R). On the other hand, A2 = {c + c* I c G #i(Z)} is contained in #i(Z) and hence is discrete. Since A2 has finite index in Ai and since H\(Z) PI H*(R) lies between A2 and Ai, Hi(l) D Hf(R) is a lattice in H?(R). This completes the proof. Theorem 11.25. Fix a Z basis ci,..., c, of #i(Z) fl //^(R), so that c\,..., cg is a basis over C of H*(C). Let f\,..., fg be the dual basis of S2(r0(N)). Then each Hecke operator T2(n) is given in the basis fi j • • •»fg by a matrix with integer entries. Remark . The existence of the bases c\,..., cg and f\,..., fg in the theorem is assured by Proposition 11.24. Proof. T(n) acts on #i(Z), with linear extensions to H\(R) and i/i(C). By Proposition 11.24b, it maps H?(C) to itself, hence Hf(R) to itself. Thus T(n) maps #i(Z) 0 #j*"(R) to itself. Hence it is given in the basis ci,..., cg by a matrix with integer entries. Proposition 11.24 shows that T2(n) is the transpose of T(n), and the theorem follows. Corollary 11.26. The eigenvalues of T2(n) on S2(To(N)) are atee- braic integers. Proof. By Theorem 11.24, T2(n) is given in a suitable basis by an integer matrix. The characteristic polynomial is then a monic polynomial with integer coefficients, and the roots must be algebraic integers. Theorem 11.27. Let c\ G S2{To(N))' be the linear functional that extracts the first Fourier coefficient, and let T be the commutative sub- algebra of Endc(S2(ro(7V))) generated by the identity and the Hecke operators. Then the map L :T —+ S2(To(N))' given by L(T) = cioT (11.44) is one-one onto and exhibits the left regular representation of T on itself as equivalent with the representation of J on 52(r0(Ar))/ = #*(C). In particular, dime T — 9- REMARKS. We continue to write g for the genus of Xq(N). It will be convenient in this proof to drop the subscript 2 on Hecke operators T2(n).
330 XI: EICHLER-SHIMURA THEORY Proof. We can write L(T) = (c\ o T, •), where (-, •) is the pairing of S?(r0(N))' with S2(T0(N)). For T0 and T in T, we have To(L(T))(f) = (T0{c1oT,)J) = {{c,oT,),To/) = (CloT,T0/> = cx(TT0f) and L(T0T)(f) = (c, oT0T)(f) = Cl(T0Tf). These two are equal since T is commutative. Hence L is equivariant. Before proving that L is one-one onto, we make a construction. Let fi,- — yfg be a basis of S2(T0(N))i and write with cn G C9. Our first observation is that {cn}~=1 spans C. (11.45) In fact, let £ be any linear functional on C9 with £(c„) = 0 for all n. Since £ is continuous, £(f(r)) = 0 for all r. (11.46) Let ri,...,rr be such that the set of r vectors I I with WW 1 < i < r is a maximal linearly independent set among the infinite set of all I I. For each r we can then find scalars ai(7"),... ,ar(T) \f.{T)) such that 7i(r)\ //i(n)\ /f^y ! =«i(r) : +... + «r(r) : ,/»W/ \/»(n)/ \/,(rr),
5. HECKE OPERATORS ON INTEGRAL HOMOLOGY 331 It follows that the functions f\,..., fg lie in the span of a\,..., ar. Hence g < r, and the vectors span (X By (11.46), ((C) = 0. Thus £ = 0, and (11.45) follows. For each T in T, we define a g-by-g matrix A(T) by (11.47) If we write T(m)/, = £b„e2*"*T >r(m)/,/ n=l with bn € C*, we have 'T(m)h n = l = A(T(m))^cne2"nT. n=l Equating coefficients, we obtain hn = A(T(m))cn. Referring to Proposition 9.15, we see that bi = cm. Hence cm = i4(T(m))c1. (11.48) We can now prove that L is one-one onto. Suppose that L(T) = 0. Since L is equivariant and T is commutative, we have 0 = T(m)L{T) = L(T(m)T) = L(TT(m)) = ci o 7T(m) (11.49)
332 XI: EICHLER-SHIMURA THEORY for all m. Let e;- be the jth standard basis vector of C, and apply (11.49) to/,. Then /7T(m)/i\ 0 = c1(7T(m)/i) = c1( e;) \TT{m)fJ = Cl(A(T)A(T(m))l : e,) by (11.47) /d(/i)\ = A(T)A(T(m)) \ J ey by linearity of Cl() W/f)/ = A(T)A(T(m))d ■ ti = A(T)cmej by (11.48). Since j is arbitrary, A(T)cm = 0. By (11.45), A{T) = 0. Substituting into (11.47), we have Tfi = • • • = Tfs = 0. Thus T = 0, and L is one-one. Suppose i(T) does not span ^(IV^))'. Choose / ^ 0 with L(T)(/) = 0 for all T G T. (11.50) Write / = £J=1 *,/,-. We have /T(m)A\ L(T(m))fi = Cl(T(m)/;) = Cl( • «,•) Vr(m)/,/ = ci(i4(m)| : •«,-) by (11.47) = i4(m) I I • Cj by linearity of ci(-) W(>»)/ = ^4(m)ci • ej = cmej by (11.48). Multiplying by Zj and summing on .;', we have L(T(m))/ = cm- ( : ) (11.51)
6. MODULAR FUNCTION j(r) 333 for all m. The left side of (11.51) is 0 by (11.50). By (11.45), I \ J is the 0 vector. Thus / = 0, contradiction. We conclude that L is onto. 6. Modular Function j(r) Our objective in this section is to address the special properties of the compact Riemann surface X0(N) that allow us to realize it in a particular way as the set of C points of a projective curve defined over Q. The function j(r) introduced in §VIII.2 will play a critical role in our construction. A meromorphic function / on W is said to be automorphic of weight Jfe and level N if fo[y]k = / for all y £ F0(N) and such that the following condition holds: For every cusp /^(oo) relative to ro(Af), there is some Cp such that / o [P~l]k{T) is analytic for Im r > Cp with an expansion fo\rlur)= f; #>«& n= —oo that has only finitely many ex ^ 0 for n < 0. The space of such functions is denoted Ak(To(N)). It is an enlargement of the space Mie(To(N)) of modular forms so as to allow poles in H and at the cusps. Our interest will be in the space Ao(To(N))f which is a field. This space corresponds to the field K(Xo(N)) of meromorphic functions on Xq(N) in the same way that 52(To(AT)) corresponds to the space of holomorphic differentials. In more detail, let {(£/i,^»)}«e/ be an atlas for Xo(N). A system {/t}te/ of scalar-valued meromorphic functions /; on Ui yields a meromorphic function on Xo(N) if fc o ^r1 = fj o <p~l on <p(Ui fl Uj) for all i and j. Let tt : H* —► Xo(N) be the quotient map. If {fi}i£i is a meromorphic function on Xo(N)) then we define a corresponding scalar-valued function f on'Hby /(r) =/{(T(r)) (11.52) if (Ui,<pi) is a chart about ^r(r). The same kind of argument as for Proposition 11.6 yields the following result: The map {/<} —► /of (11.52) is well defined and is a field isomorphism of K(Xq(N)) onto Ao(Tq(N)). Typically we shall use the same name for a member of Ao(To(N)) and its counterpart in K(Xo{N)). Let us take N = 1 for the moment. The function j(r) in Ao(r0(N)) comes from a member of Ko(Xq(1)), according to the proposition, and
334 XI: EICHLER-SHIMURA THEORY we are going to denote this member of K(Xo(N)) by .;' also. For the case of r0(l) = 5L(2,Z), there is just one cusp, and j(r) has a simple pole there, according to Proposition 8.2. Moreover, j is analytic on H by (8.3). Proposition 11.28. .; : SL{2,T)\H —► C is one-one onto. Proof. Fix zq in C. As a meromorphic function on Ao(l), j — z0 has divisor of degree 0, by Proposition 11.8. Since the only pole is at 7r(oo) and is simple, the divisor must be [j -zo] = P- *(oo) for some p ^ 7r(oo) in 5L(2,Z)\W\ Then j takes the value zQ at p G SL{2,Z)\H and only there. Lemma 11.29. Let / in j4o(Fo(1)) be analytic on 7i and have q expansion /(r) = Y1™=-m cnQn at °°- Then / is a polynomial in .; with coefficients in the module over Z generated by the coefficients cn. Proof. We induct on M for M > 0. For M = 0, / corresponds to an analytic function on Xo(l) and must therefore be constant. Hence / is the polynomial cq in .; of degree 0. Assuming the result for M — 1, we observe from Proposition 8.2 that / — C-mJM has a q expansion 52^L-Af+i dnqn- Since the coefficients of.; are in Z, each dn is in cn + c-mZ. Hence the lemma follows by induction. Theorem 11.30. K(X0(l)) = C(j). Proof. If/ is in X(A"o(l)), then / has only a finite number of poles. If / has a pole at a point p ^ ?r(oo), then Proposition 11.28 shows that U — j(p))~l has a simple pole at p and no other pole. Arguing as in the proof of Lemma 11.29, we can subtract from / a linear combination of powers of (j — j{p))~l and obtain a function with no pole at p and with no new pole. Continuing in this way, we can reduce to the case that / has its only pole at tt(oo). Then Lemma 11.29 completes the proof. Now we consider general N. We have denoted by M*(N) the set of 2-by-2 primitive integer matrices of determinant N. Let us write T = r0(l) = 5L(2,Z). Lemma 9.1 says that M-(N) = r(^ J)r= Q r«, (11.53)
6. MODULAR FUNCTION j(r) 335 and gives an explicit set of coset representatives. The number tp(N) is given by (9.5). The modular polynomial of order N is the polynomial $N(X) = $n(j,X) of degree ip(N) defined by *n(X)= Y[(X-jocn). (11.54) For 7 £ T, we have j o (yati) —jo [7)0 o or, = j o at-; hence $w(X) does not depend on the choice of coset representatives. The first thing to notice about $n is that one of its roots is jpj = The same argument as with Examples 1 and 2 of §IX.3 (-)• shows that both j and j^ are in the field K(Xq(N)). In fact, Theorem 11.32 below will show that jn is algebraic over C(.;) and that &n is its minimal polynomial. The theorem says even more, restricting the nature of the coefficients. Lemma 11.31. With ori,... ,a^,(N) as in Lemma 9.1, the functions j oociy... yj o cty(7v) are distinct on 7i. Proof. Let us write j(T)=1- + P(q), (11.55) where P is a power series with integer coefficients. If a, = I n , j is as in Lemma 9.1 and if qd = e2*iTtd and Cd = e2,r,/rf, then i.«M-i(=£i)--k+«*}>. in. 56) Suppose ofi» = I „ j has j oca = j ° <*»'. Taking the quotient and letting q tend to 0 (i.e., letting Im(r) tend to +00), we obtain tiCd = 4<*- (H-57) a 0! It follows first that - = —. Since ad = aid1 - N and all are positive, a d we obtain a = a' and d = d'. Substituting into (11.57), we obtain 6 = b'. Thus or; = or,/.
336 XI: EICHLER-SHIMURA THEORY Theorem 11.32. The polynomial $n(X) has coefficients in Z[j] and is irreducible over C(j). Proof. Any 7 in T acts on .; o at, sending it to j o or, o 7. By (11.53) the result is some j oaki and hence the effect of 7 is to permute these functions. Since all of the functions contribute once to <£#, the coefficients of $s are invariant under I\ The coefficients are elementary symmetric polynomials in the functions j o oti and thus are analytic on H. The same argument as in Proposition 9.5 shows that each jo or, has a meromorphic qs expansion at 00, and hence the same thing is true of the coefficients of $x. It follows from the Y invariance that the q expansions of the coefficients of $# are meromorphic at 00. By Theorem 11.30, the coefficients of $n are in C(.;). The polynomial $/v splits in the field C(j o a\,..., j o ar^,(jv)), and T acts as automorphisms of this field, fixing C(j) and permuting the roots of $w. Since or, is in roriT, we have or, = 70:17' or 7_1ari = c^it'- Thus T acts transitively on the roots. By Lemma 11.31 the roots are distinct. Consequently $jy is irreducible over C(.;). We are left with showing that the coefficients are actually in 2\j]. Let us write j(r) and joa^r) as in (11.55) and (11.56). When we form the elementary symmetric polynomials in the functions (11.56), we know that the result involves only integral powers of q, and we can see that the coefficients of this resulting series are in Z[0v] since d \ N. Regard the coefficients of the series as in Q(Gv)> and consider the automorphism of this field given by 6v —► £)v> where GCD(r, N) = 1. The expression (11.56) is mapped to ^W + ^sci1), where V is the least residue > 0 of rb modulo d. This expression is just j o ar,'(r), where or,/ = f J. The map (a,b,d) —► (a,b\d) represents a permutation of the representatives oti. Thus the effect of Cn —► Cs on ^e coefficients of the q series expansion of a coefficient of $/v is the same as the effect of a certain permutation of the roots, i.e., to leave things fixed. Since Q(Gv) is Galois over Q, the coefficients of the q series expansion of a coefficient of $at are in Q fl Z[0v] = Z. By Lemma 11.29 the coefficients of^N are in Z[j]. Theorem 11.33. K(XQ(N)) = C(j,jN). The proof will use the following two lemmas.
6. MODULAR FUNCTION j(r) 337 Lemma 11.34. If g € T has j(N(gr)) = j(Nr) for all r€«, then g is in r0(JV). Proof. Let a — ( n 1 1. Since Nr = a(r), we have j(aga~1Nr) — j{Nr) for all t inTi. Hence j{aga"lr) = j(r) for all t inTi. By Proposition 11.28, there exists fiT in T with <*ga-l(T) = PT(T) for all r in Ti. Fix ro in the interior of the standard fundamental domain R of T. The point aga"l(r0) is in the interior of /3To{R) and hence so is otga~l(r) for r near ro- Consequently /?T = ±/?ro for r near r0. By analyticity aga~l = ±/?To- Consequently g is in 5L(2,Z)na-15L(2,Z)a, and this is Tq(N) by Lemma 9.2. Lemma 11.35. Let F be a field of characteristic 0, and let G be a finite subgroup of AutQ(F). Then F is a finite Galois extension of FG = {x e F | v>(*) = x for all p £ G}, and AutFo(F)=: G. Proof. For x G F, let /r(J0 be the member of F[X] given by /.w= II(x-^x))- (1L58) Then /r is in FG[X]. Since /x(x) = 0, x is algebraic over FG and [FG(x) : FG] < |G|. Choose y in F such that [FG(j/) : FG] is maximal. For any z in F, the Theorem of the Primitive Element yields some w in F such that FG(z,y) = FG(w). By maximality [FG(z,y) : FG] = [FG(w) : FG) < [FG(y) : F% and thus z is in FG(j/). Therefore F = FG(y) and [F : FG] < \G\. In particular, F is a finite algebraic extension of FG. Fix an algebraic closure F of F. If tp is in HomFo(F, F), then fy(i>(y)) — 0 since fy has coefficients in FG. Referring to (11.58), we see that tp(y) = <p(y) for some <p in G. Since y generates F over FG, we have ^ = y? on F. We conclude that every member of HomFo(F, F) carries F into itself and coincides with a member of G. The first of these conclusions implies that F is a Galois extension of FG, and the second implies that G is the Galois group.
338 XI: EICHLER-SHIMURA THEORY PROOF OF THEOREM 11.33. Let r(N) be the principal congruence subgroup of level N. Let K = Aq(T(N)) be the field of meromorphic functions / on 7i such that / o [7]0 = / for all 7 € T(N) and such that the following condition holds: For each 0 in 7 = SL(2, Z), there is some Cp such that / o [/?_1]o(t") is analytic for Im r > Cp with an expansion f°[0-1]o(T)= J cWgfr n=-oo that has only finitely many c}{ / 0 for n < 0. Then A0(r0(N)) ^ K(X0(N)) is contained in K. The group T acts on K by automorphisms under / —► / o a = / o [or]0f and r(AT) fixes J<\ Hence G = V/T(N) acts on A. By Lemma 11.35, K is a finite Galois extension of KG = Ar, and the Galois group is G/Go, where Go is the subgroup of G that fixes K. A member / of Kr has / = / o [or]0 for all a € T, hence is in K(Xo(l))> which Theorem 11.30 identifies as C(j). Thus K is a Galois extension of C(j) with Galois group G/Go. Let us determine the Galois group Gf of K over the intermediate field C(j,jjv). Kg is in G', we can regard y as a member of T satisfying joNo^ = joN. By Lemma 11.34, 0 is in r0(A0- Conversely T0(N)/r(N) does fix C(jJN). Hence G' = (To(N)/T(N))/Gq. By the Fundamental Theorem of Galois Theory, Since K(X0(N)) C A", we have K(X0(N)) C A'r°W. On the other hand, comparison of the definitions of K(Xo(N)) and A shows that K(X0(N)) D Kr«N\ Thus C(j,j>) = Ar°<N) = K(X0(N)). The Q structure on Xo(N) will result from a relationship between K(Xo(N)) = C(j,Jn) and Q(j,j;v) that is given by the equivalent conditions in the next theorem, in which we eventually will take x = j, y = jNj and A?o = Q. The conditions are satisfied as a consequence of Theorem 11.32. Theorem 11.36. If C(x,y) is a field with x transcendental over C and y algebraic over C(x), then the following conditions on a subfield k0 of C are equivalent: (a) y is algebraic over &o(x), and the minimal polynomial of y over ko(x) remains irreducible over C(x), (b) [k0(x1y):k0(x)] = [C(x)y):C(x)]) (c) Cnkoix.y) = k0, (d) koCikoix.y) = k0.
6. MODULAR FUNCTION j(r) 339 REMARK. Condition (d) may be restated as: ko is algebraically closed in k0(xyy). Proof, (a) <* (b). This is clear. (b) => (c). Let u be in C and in ko(x, */)• From the inclusions M*,y) 2 Mx>w) 2 Mx) C(x,y)DC(r,«)DC(*), we have [kQ{x,y) : k0(x)] = [t0(x,y) : M*.W)][M*»W) : k0(x)] (11.59) [C(x,y) : C(*)] = [C(x,y) : C(«,w)][C(x,w) : C(x)], with the left sides equal, by (b). Since y is algebraic over ko(x,u) and Jko(x), we have [kQ(xyy) : k0(x,u>)] > [C(x,y) : C(x,w)] (ll.oO) [*o(*,«):*o(*)]>[C(*,«):C(*)]. Comparing (11.59) and (11.60), we see that actually the inequalities (11.60) are equalities. Since u; is in C, we have [k0{x,u):k0(x)] = [C(x,u):C{x)] = l, and we conclude that w is in ko(x). Let u = Pi(x)/P2(x) with Pi and P2 in k[x] and P2 ? 0. Put P(X) = Pi - wP2(*) in C[X]. If P is not the 0 polynomial, then P(x) = 0 shows that x is algebraic over C, contradiction. So P = 0 and every coefficient of Pi — u>P2 is 0. Since Pi and P2 have coefficients in ko and Pi is not 0, it follows that u is in ko. (c) => (d). This is clear. (d) => (b). First let us show that (d) implies [kQ(z,Lj) : k0(x)] = [Ao(*,w) : *o(x)]. (H-61) In fact, otherwise we can find a finite Galois extension k\ of ko for which [to(x,ai) : *<>(*)] > [ti(*,w) : *i(*)l- Hence [ki(x) : k0(x)] > [fcifar.y) : t0(*,y)]. (11.62)
340 XI: EICHLER-SHIMURA THEORY The left side here equals [k\ : Jbo] because any element of Jbi(x) can have its denominator "rationalized" by Aut*0(Jbi) so as to be written in the form Pi(r)/P2(z) with P\ in k\[x] and P2 in k0[x]. Thus k\(x) is Galois over ib0(x), and k\{x,y) is Galois over fco(x,y). Since k\ is Galois over Jbo, any ko isomorphism of k\ into a field containing k\ sends k\ into k\. Hence every member of Autjb0(riy)(ti(x,y)) sends k\ into k\ and fixes x, thus sends Jbi(x) into itself while fixing fco(x). Thus we have a homomorphism Autfco(r,y)(^i(x,y)) —> AutJko(a.)(Ar1(x)) that is clearly one-one. Let H be the image in Autjt0(x)(ibi(x)). The assumed inequality (11.62) means that H is not all of AutJbo(J.)(fc1(x)). Let k[ be the field with k\ D k[ D ko and k[(x) = k\(x)H; here k[(x) ^ *o(x). Form k[(x}y). Every element of Autjfc0(ry)(fci(x,y)) fixes this, by construction. Hence &J(x,y) = ko(x,y). Thus k[ C ibo(x,y). By (d), k[ = ko. Since k[(x) ^ ko(x), (11.62) has led to a contradiction. Thus (11.61) holds. To complete the proof of (b), we shall show that [k0(x,y) : *„(*)] = [C(x, y) : C(x)], (11.63) even without assuming (d). Let P(Y) 6 ib0(x)[Y] be an irreducible polynomial with P(y) = 0. Multiplying the coefficients by a suitable member of ibo(x), we may assume that P(Y) is in £o[x][Y] and is primitive over ibo[x]. We are to prove that P(Y) remains irreducible over C(x). Suppose on the contrary that P factors nontrivially over C(x). By Gauss's Lemma it factors over C[x]. We can thus write P(Y) = Q(Y)R(Y) (11.64) with Q and R primitive over C[x]. Let us make x explicit by writing (11.64) as P(x,Y) = Q(x,Y)fl(x,Y) (11.65) with P(x,Y) in Ar0[x,Y] and with Q(x,Y) and fl(x,Y) in C[x,Y]. Applying the evaluation homomorphism x —► c G ko, we obtain a factorization of the polynomial P(c,Y) of one variable, and we see that Q(c,Y) and R(cyY) are in ko[Y]. Write Q(X,Y) = J2Qi(X)Y\ We have seen that Qi{c) is in Jb0 for every c in ko. If degQ, = n, we can construct Qf(X) in ko[X] of degree n with Qf (/) = Qi{l) for 1 < / < n. Then it follows that n Qf{X)-Qi(X) = cY[(X-l)
7. VARIETIES AND CURVES 341 with c ^ 0 in C. Evaluating at n + 1, we see that c is in ibo- Hence Qi is in k0[X], and Q(X,Y) is in k0[XtY]. Similarly R(Xt_Y) is in k0[X,Y]. Thus (11.65) contradicts the irreducibiHty of P over k§(x). This proves (11.63), and (b) follows by combining (11.61) and (11.63). 7. Varieties and Curves In this section and the next we shall complete the task of realizing Xo(N) in a particular way as the set of C points of a projective curve defined over Q. So far our discussion of curves has been limited to plane curves. In order to treat Xo(N) adequately, we shall need to carry out two additional steps. First, in this section, we extend our discussion from plane curves to varieties and curves in higher dimensional projective spaces, and we define types of mappings between such varieties and curves. Second, in §8, we characterize curves in terms of their function fields. In the case of Xo(N), we can regard the irreducible polynomial $n(X) of Theorem 11.32 as a polynomial $jv(Xi, A"2) over Q in two variables such that &n(J,Jn) = 0. Theorem 11.33 shows that r —> (J(t)Jm(t)) gives a one-one parametric mapping of Xo(N) into the C points of <£. We could then apply a desingularizing process to obtain a nonsingular curve over Q in a higher dimensional projective space whose C points are biholomorphic with Xo(N). But as we shall see, the theory of algebraic curves provides a faster procedure. By working directly with the function field K(X0(N)) and using Theorems 11.32, 11.33, and 11.36, we can arrive at an appropriate nonsingular curve over Q without working further with explicit equations. The speed of our procedure is achieved at the price of not easily being able to make explicit computations. This disadvantage was already noted in §1, where we mentioned that the equations for the elliptic curves that arise in Eichler-Shimura theory had to be deduced from side information. We begin by assembling background information about varieties and curves in general. We omit many of the proofs. We shall not return to X0(N) until §8. Let ko be a field, and let k be an algebraically closed field with kQ C k. We shall assume that ko has characteristic 0 or is a finite field and that the fixed field of Aut*0(£) is ko. In many situations that will be clear from the context, a definition or result valid for ko is also valid for k. To any ideal / in k[X\>..., Xn], we associate its zero set in kn: Vi = {x = (*!,..., xn) e kn | /(*) = 0 for all / £ 1}.
342 XI: EICHLER-SHIMURA THEORY Any such set Vj is called an affine algebraic set. If V C kn is an affine algebraic set, its ideal (of polynomials vanishing on it) is I(V) ={fe k[Xly... ,Xn] | /(x) = 0 for all x € V), Then I(Vi) D L An ideal in a polynomial ring over a field is finitely generated (i.e., the ring is Noetherian), by the Hilbert Basis Theorem. We say that an affine algebraic set V is defined over k0 if I(V) can be generated by members of kQ[X\y..., Xn]. In this case we often abbreviate "V defined over fc0" by V/kQ, we define I(V/kQ) = I(V)nk0[Xu...,Xn], and we define the set of ko points or ko rational points of V to be v(kQ) = vnks. The group action by Autjtft(ib) on k extends to k[X\y... ,Xn] with all Xj fixed by all of AuU0(Jfc). Then k0[Xly... ,Xn] is the full fixed ring under Autfc0(£). In the case of V/ko, I(V) is mapped into itself by Autjfc0(Ar), and I(V/ko) is the subset of I(V) fixed by Autjt0(Jb). Also V(kQ) is the subset of V fixed by Autjt0(fc). Theorem 11.37 (Hilbert Nullstellensatz). If / is an ideal in k[X\,... ,Xn] and / is in I(Vj) then fr is in / for some integer r > 0. If / is a prime ideal, then it follows from Theorem 11.37 that / = I(Vi), and the situation is more manageable. Accordingly we shall systematically work with this situation (even though we did not do so in Chapter II). An affine algebraic set V is irreducible if it is not the union of two proper affine algebraic sets, or equivalently if I(V) is prime. An irreducible affine algebraic set is called an affine variety. For example, if P(X,Y) is an irreducible polynomial in fc[X,Y], then its zero locus in k2 is an affine variety, being Vj for / equal to the prime ideal (P). The affine coordinate ring k[V] of an affine variety V is the Noetherian ring defined by i[V] = k[Xu...tXn)/I(V). Since I(V) is prime, k[V] is an integral domain. Its quotient field k(V) is called the function field of V; it is finitely generated over Jk since fcfV"] has this same property. In the case of V/kQ) the affine coordinate ring is ko[V] = ko[X1,...,Xn]/I(V/ko). Since I(V) is prime, so is I(Vfko), and therefore k0[V] is an integral domain. Its quotient field k0(V) is the function field of V/k0 and is finitely generated over ko.
7. VARIETIES AND CURVES 343 Also in the case of V/k0i Aut*0(fc) acts on k[Xi>... yXn] and carries I(V) into itself; therefore it acts on k[V]. One can prove that the fixed ring is naturally identified with fco[V]. Similar remarks apply to k(V) and Jfco(K). The dimension of an affine variety V is the transcendence degree of k(V). An affine curve is an affine variety of dimension one. In other words, there is to exist some / in k(V) not in ib, so that k(f) is transcendental over k, and any g in k(V) is to have the property that k(f>9) ls algebraic over k(f). In the case of characteristic 0, it follows from the Theorem of the Primitive Element that k{V) = k(f, h) for some suitable h. (Cf. Theorem 11.20.) Let V C kn be an affine variety, and let f\,...,// be a set of generators for I(V). A point x of V is a nonsingular point if the matrix has rank n — d, where d is the dimension of V. The variety m. V is nonsingular if every point of it is a nonsingular point. These definitions do not depend on the set of generators. There are two different constructs with which we can see this independence. The first one is the easier to use in computations. For it, let x be in K, and let mx be the ideal in k[V] given by mx = {fe ~k[V] I f{x) = 0). This is a maximal ideal in k[V], since evaluation at x is an isomorphism of i[V]/mT with h, and the quotient mx/ml is a finite-dimensional vector space over jb. The following proposition shows that the definition of nonsingularity does not depend on the system of generators. Proposition 11.38. A point x of an affine variety V is nonsingular if and only if the vector space dimension of mx/m^ equals the dimension of V. The second construct for seeing that nonsingularity is intrinsic uses quotients of elements of k[V]. Let k[V]x be the subring of k(V) given by \F = g/h, \g and h are in k[V]} \h(x)^0 This is a local ring in the sense that it has a unique maximal ideal Mx, namely the members vanishing at x. It is called the local ring of V at x. It is Noetherian since k[V] is Noetherian, and it is an integral domain since it is contained in a field. The members of k[V]x are the elements of W)* = {fz Hv)
344 XI: EICHLER-SHIMURA THEORY k(V) with a definite value at x, i.e., F = g/h has F(x) = g(x)/h(x) well defined and finite. Such elements are said to be regular or defined at x. Nonsingularity can be decided by Afx/M|, according to Proposition 11.38 and the following result. Proposition 11.39. The inclusion mx C Mx yields an isomorphism mx/ml ~ Mx/M*. Proof. We get a ring homomorphism mxfmx —► Mx/Mx. If g/h is given in MXi then «■>-'.=H{) (*^) exhibits h(x)~1g in M* as mapping to — -f M^. Hence the map is onto. If g in mx maps to JZ ( i~ J ( "77 J in ^r ♦ we can clear fractions and write hg = ]T)<7i<7t'/i" for an element h of £[V] with h(x) £ 0. Here EflftfW is in ml- The set {/ € *[V] | /$ G m£} is an ideal in k[V] containing mx and also hf which is not in mx. Since mx is maximal, 1 is in the set and g is in m£. Thus the mapping mxfmx —► Mx/M% is one-one. Projective n-space Pn(£) is defined in the same way as with Pzik), namely as the quotient of {(x0,.-.,xn) £ fcn+1 — {(0,... ,0)}} by an equivalence relation, two points being equivalent if one is a £x multiple of the other. Our notation for writing a point of Pn(k) with care is [(x0)... ,xn)]. We let Pn(fco) be the subset of Pn(k) for which x0,... ,xn can be taken in k0. To any homogeneous ideal / in k[Xo,..., Xn], we associate its zero set in Pn(k): Wj = {[(x0,..., *„)] € Pn(k) | /(x0,..., x„) = 0 for all / € /}. Any such set Wj is called a projective algebraic set. If W C Pn(k) is a projective algebraic set, its homogeneous ideal (of polynomials vanishing on it) is the ideal 1{W) generated by all homogeneous / in k[Xo,...,Xn] such that /(x0,...,xn) = 0 for all [(x0,... ,xn)] G W. It is always true that I(Wj) D I. We say that a projective algebraic set is defined over Jbo if 1{W) can be generated by homogeneous members of ko[X0,..., Xn). In this case we abbreviate "W defined over ^o" by W/k0t and we define the set of k0 points or k0 rational points of W to be W(k0) = W C\ Pn(ko).
7. VARIETIES AND CURVES 345 A projective algebraic set W is irreducible if it is not the union of two proper projective algebraic sets, or equivalently if I(W) is prime. An irreducible projective algebraic set is called a projective variety. Projective transformations of Pn(k) onto Pn(k) are defined in analogy with the case of n = 2 that was discussed in §11.1. About any point [(x0,... ,xn)] of Pn(k) we can introduce various systems of affine local coordinates. Namely choose $ in GL(n -f l,jb) with 4>(x0,.. .,xn) = (1,0,..., 0). (For many purposes we ignore (io,...,in), and it suffices to take $ to be the identity or a permutation matrix that interchanges 0 and i.) Then we can define affine local coordinates on [&~l(kx x in)] to kn by the one-one map V([*-1(y0,...,yn)])=f^,...,^). (11-66) V 2/o t/o / If W is a projective algebraic set with homogeneous ideal I{W) and if # is given as above, we shall write W Clkn for y?([$_1(^x x Jbn)]fl W). This is an affine algebraic set with ideal I(W R kn) C i[X\,..., Xn] given by "-M'r *",=F(*",(:^»<-> Conversely if/ is given in k[Xi,..., Xn] of degree d and if $ is given as above, then we can define /// as a homogeneous element of k[Xo,..., Xn] in two steps by gH(X0y...yXn) - X$f ( -rr,...,Tr- ) \Xo Xo/ (11.68) Ih = 9h° $• If V is an affine algebraic set with ideal I(V), then the projective closure of V relative to $ is the projective algebraic set W = V whose homogeneous ideal is generated by {/// G i[Xo,..., Xn] \ f £ I(V)}. If we combine these two constructions, passing from F to / as in (11.67) and then from / to fH as in (11.68), we find that F = (X0o<P)rfH (11.69) for an integer r > 0. As a consequence we can prove the following.
346 XI: EICHLER-SHIMURA THEORY Proposition 11.40. Fix $ in GL(n + 1,4) to determine affine local coordinates. (a) If V is an affine variety, then W = V is a projective variety and v = wnkn. (b) If W is a projective variety, then V = W R kn is an affine variety, and K C W. The inclusion is strict only if Xo o $ is in /(W), and in this case V = K = 0. (c) If $ is in GL(n + 1, fc0) and W is a projective variety with W fl kn nonempty, then W is defined over k0 if and only if W fl kn is defined over fco- Proof. Let us check that I(W) prime implies I(W fl ibn) prime. If homogeneous F and G lead to / and g with fg in J( W fl fcn), then FG vanishes on Wf\kn and (X0o$)FG vanishes on W. Hence (X0o$>)FG is in Z(W^). Then (X0 o$)F or G is in I(W)y and it follows that / or g isinI(Wnkn). Next let us check that I(V) prime implies I(V) prime. If/ and g lead to /// and gfj with /h9h = hjf in /(K), choose h in /(V) leading to /i//. Then fg = h,forg is in /(K), and /ff or gn is in /(K). Then (a) is clear, and (11.69) implies that I(W) C I(V) in (b). Hence V CW. If the inclusion is strict, let fH be in I(V) but not in I(W). Then /// comes from some / in I(V), which comes from some F in I(W). By (11.69), F = (X0o<$>)rfH. Since /(H/) is prime and fH is not in I(W)y X0 o$ is in /(W^). Then 1 is in I(V)y and V = 0. This proves (b), and (c) follows from the definitions. If W is a projective variety, we define the function field k(W) of W to be the field of quotients F(X) = g(X)/h(X) in i(X0,... yXn) such that g and h are polynomials homogeneous of the same degree, h(X) is not in /(VK), and quotients — and — are identified if gh!—g'h is in I(W). In this way, k(W) has a canonical definition independent of <£. We can describe k(W) alternatively using affine local coordinates. Namely (b) shows that we can choose $ in GL(n + 1, ik) so that W fl kn = W. Fix such a choice of $. Using $, we can identify the function field k(W) with k(W H kn). Consequently k{W) is finitely generated over ib. If W is defined over fc0, we can treat *o(W) similarly, as a consequence of Proposition 11.40c; here we must take $ in GL(n + 1, k0). The dimension of W is again the transcendence degree of k{W). If x is in WL we can choose $ so that x is mapped by y? in (11.66) to a point of WC\kn. Then it is meaningful to speak of the local ring k[W]x of W at x. We regard k[W]x as a subring of k{ W)\ it is the subring of elements that are regular or defined at x. The ring k[W]x is Noetherian.
7. VARIETIES AND CURVES 347 Finally we can consistently define x to be nonsingular if dim Mx/M£ = dimly, by virtue of Propositions 11.38 and 11.39. The projective variety W is nonsingular if it is nonsingular at every point. The group Autjb0(ib) plays the same role for projective varieties that it does for affine varieties. We shall not list the details. We shall encounter products of projective varieties and shall want to regard them as projective varieties. First we consider the product of two projective spaces. If Pm(k) has points [(xo, • • • > xm)] and Pn(k) has points [(y0> • • •, 2/n)]> then the Segre embedding, which maps [(x0,...,xm)] x [(yo,...,2/n)] —►[(...,*.■%,...)] = [(...,*•;,...)] in lexicographic order, exhibits Pm(k) x Pn(i) as contained in PM(k) with M = mn + m + n. The image of the Segre embedding is the variety of the ideal generated by all ZijZiiji — ZiyZiij. It is a projective variety in P\f(k). If W C Pm(k) and W' C Pn(k) are projective varieties defined by homogeneous polynomials {f(X)} and {y(y)}, then the product W x W C P\f(k) is the variety of the ideal generated by all ZijZi'j'—Zij'Zi'jy all f(X)Y*eg*, and all g(Y)Xfegg, in obvious notation. It too is a projective variety in Pm(&)• If W and W' are nonsingular, so is W x W. WW and W1 are defined over k0f so is W x W1. Let V\ C Pm(k) and V2 C Pn(k) be projective varieties. A rational map F from V\ to V2 is a tuple F = (/o,..., fn) of members of k(V\) such that for every point x £ V\ where /o,..., fm are all defined, F(x) = [(/0(x),...,/n(x))] is in V2. If V\ and V2 are defined over ^0 and if /o,. -,/n can be multiplied by a common member of £x so as to be in fco(Vi), then F is said to be defined over k$. Example 1. Let Vx be Pi(C) = {[(x0jari)]} with I(VY) = 0, and let V2 be the curve in Pi(£>) = {[(j/o, 2/i, 2/2)]} given by y0y2 = y\, i.e., having homogeneous ideal (y0y2 - y\) in C[j/0>2/1,2/2]. Then F = (xg,x0xi,x?) is a rational map. It is defined over Q. Let G : V2 —* V\ be given by G = (1, — 1. This too is a rational V 2/0/ map defined over Q. Example 2. Let Vi be the elliptic curve (11.15) with /(Ki) generated within C[xo)x1,x2] (in the current notation) by X0X2 + xlx2 - x\ -f x0x?.
348 XI: EICHLER-SHIMURA THEORY Let V2 be the elliptic curve (11.13) with /(V2) generated by x0x% + xgx2 - x\ + x0*i + lOxjfci + 20xq. The map (11.16a) is F = (/o,/i,/2), where /0 = 1 and / / >> Xl _l *o , 2x0 xg /i(*o.«i.«») = - + ^ + ^7^ + (Xl_*0)2 , ^ , »\-x* 2x2 + x0 fxg t xl t xl \ xo x0 \x\ (xi — xo)*5 («i — *oJ / Then F is a rational map, and it is defined over Q. Example 3. The affine curves v2 = 2u4 - 1 and y2 = x3 + 8x are related by the transformations (3.19) and (3.21). If the curves and the transformations are recast in projective form, then the transformations are rational maps. A rational map F = (/o,... ,/n) between projective varieties V\ and V2 is regular or defined at x in V\ if there is some g in k(V\) such that gfi is regular at x and ((*/«>)(*) (*/„)(*)) is not the 0 tuple. A rational map that is regular at every point of V\ is a morphism. Examples. In Example 1 above, both F and G are morphisms. This is clear for F. In the case of G, G is given also by G = ( —, 1 ). \Vt ) Hence the only question is about points of V2 where t/o = t/i = 0. But V\/yo — V2 on V2, and hence is regular at [(0,0,1)]. Example 2 is a morphism, as a consequence of Theorem 11.42 below. It fixes 00 and hence is a homomorphism of the group structures, by Proposition 11.61 below. Its kernel is the 5-element torsion subgroup of Vi/Q. In Example 3, the two rational maps (3.19) and (3.21) are inverse to each other, and thus we say that they are birational. The first curve has a singularity at u — w = 0 if the curve is written projectively in variables (u, v>w), while the second curve is nonsingular. It will follow that the two transformations cannot both be morphisms.
8. CANONICAL MODEL OF X0(N) 349 Two projective varieties Vi and V2 are isomorphic if there are mor- phisms F : Vx -+ V2 and G : V2 -► Vx such that F o G and G o F are the respective identity maps. The two varieties of Example 1 are isomorphic. In the case of elliptic curves as in (3.23), Proposition 11.58 will note that an isomorphism must be given by an admissible change of variables over ib. Thus the current definition of "isomorphism" for elliptic curves agrees with the earlier one. Two projective varieties V\/ko and V2/ko are isomorphic over ibo if the above F and G can be defined over ko. Let us address how maps between varieties affect function fields. If F : V\ —*• V2 is a rational map between projective varieties, we can try to define a ib algebra map F* : k(V2) —+ k(Vi) between their function fields by composition: F*(f) = foF. Some condition is needed, however, for this formula to be meaningful. For example, the image of F should not completely be contained in the set where / is not defined. It turns out that it is enough that F have dense image in a suitable topology. We shall not pursue this point in full generality, however. Suppose instead that F is a morphism and carries V\ onto V2. Then F is called a dominant morphism. In this case F* is well defined and is one-one. Moreover, it carries local rings to local rings: F'(HV2]nx)) C k[Vi\x. Consequently if F is an isomorphism, then F* is an isomorphism of function fields and an isomorphism of local rings at each point. In particular, F carries nonsingular points to nonsingular points. 8. Canonical Model of Xo(N) In this section we shall relate curves to their function fields and then show how to introduce a canonical Q structure on Xq(N). We continue with fields ko C k and assumptions on them as in §7. By a function field of dimension 1, we shall mean any finitely generated extension of k of transcendence degree one. Let C be a projective curve, and let x be a nonsingular point. The key to analyzing C about x lies in the fact that every rational function on C has a well defined order at x. To define the order, we use the following lemma. Lemma 11.41. Let R be a Noetherian local ring that is an integral domain and a ib algebra, and let M be the maximal ideal. Suppose that (a) dirm:(M/M2) = 1, and (b) every element of R not in M is a unit.
350 XI: EICHLER-SHIMURA THEORY Then M is principal, being given by M = (t) for any element t of M not in M2. Also f]Zx Ml = {0}. Proof. Let t be in M but not M2. Since /Z is Noetherian, M has a finite set of generators t,su...,sr. Without loss of generality, suppose r is as small as possible. By (a), Sj — Cjt is in M2 for some Cj 6 k. Changing notation, we may assume that our generators t,si,... ,sr of M have the property that each Sj is in M2. Since the products of pairs of generators of M generate M2, we can write sr = at2 + ^2 htsj + 5Z cilSiSl' Thus sr(l-6r<-^cirs;) ^a^ + ^&^s, + H c,-|*j*|. (11.70) j j<r j<l<r The right side of (11.70) is in (<,«i,... ,5r_i), and the coefficient of sr on the left side is a unit, by (b). Hence sr is in (*,si,... ,sr_i), in contradiction to the minimal choice of r (unless r = 0). We conclude that M = (t). Let / be in f|~i M1 = n£i(*')> and write / = ''/i- Since /, = </i+1> we have (/i) C (/2) C (/3) C Since R is Noetherian, (/m) = (/m+i) for some m. Then /m+i = afm = a</m+i, and a< = 1 if /m+i ^ 0. Since / is in My at = 1 is impossible. Thus /m+i = 0 and / = 0. Let us return to the projective curve C and the nonsingular point x. The local ring k[C]x of C at x satisfies the hypotheses of the lemma. If / is in ib[C]r, we define ordr(/) to be the least / > 0 such that / is in Mlx but not M^+1. This is an integer > 0 if / ^ 0 and is +oo if / = 0. Let t be any member of Mx but not M2, so that Mx — (t)\ t is called a uniformizer of C at i. If / ^ 0 and if / = ordr(/), then / = at1 for some unit a in £[C]r. The function field k(C) is the quotient field of ik[C]r, and we can extend ordx() to all of k(C). Namely any F in k(C) is of the form F = f/9 with / and g in k[C]x and g ^ 0. We define ordx(F) = OTdx(f)-oTdx(g). To see that this is well defined, we let / = ordr(/) and m = ordx(g). Then / = at1 and g = btm with a and 6 units in k[C]x. Hence F = ab~ltl~m. So / — m = ordx(F) is characterized as the unique power of
8. CANONICAL MODEL OF X0(N) 351 t such that F/tl m is a unit in k[C]x. Then ordr() has the following properties: (a) ordr(FG) = ordr(F) + ord*(G), (b) ordr(F + G) > min{ordx(F), ordr(G)}, (c) k[C]x = {FG k(C) | otdx(F) > 0}, (d) Mx = {Fek(C)\ordx(F)>0}1 (e) ordr(F) = +oo if and only if F = 0. The function ordr() allows us to prove a theorem mentioned in connection with Example 2 in the preceding section. Theorem 11.42. Let C be a projective curve, let x be a nonsingular point of C, let V be a projective variety, and let F : C —► V be a rational map. Then F is regular at x. Consequently if C is nonsingular, then F is a morphism. Proof. Let F = (/o,..., fn) with fj £ k(C). Let t be a uniformizer at x, and let m = min {ordr(/,)}. Then ordx(^"m/;) > 0 for all j with equality for some j = jo. By (c), rmfj is regular at x, and (<"m/jo)(x) ^ 0 by (d). Hence g = rm exhibits F as regular at x. To each nonsingular point x of a projective curve C, we have associated the local subring k[C]x C k(C) and the function ordr() : k(C) —► ZU{oo} satisfying (a) through (e) above. Let us abstract this situation. A proper subring R C k(C) containing k is called a discrete valuation ring of k(C) over ib if there exists a function v : k(C) —► ZU{oo} (called the valuation) such that (a) v{FG) = v{F) + v{G), (b) v(F + G)>imn{v{F),v{G)}> (c) R={F£k(C)\v(F)>0}. Then R is a local ring with maximal ideal MR = {Fe~k{C)\v(F)>0}. For a nonsingular curve C, the next proposition says that this abstraction captures the functions ordx() completely and gives nothing additional.
352 XI: EICHLER-SHIMURA THEORY Proposition 11.43. Let C be a nonsingular projective curve, and let R be a discrete valuation ring of k(C) over k, with valuation v. Then there exists a point x of C such that v is a positive multiple of ordx(). For this x, R = k[C]x and MR = Mx. Proof. Let (x0,... , xn) be coordinates for the projective space in which C lies. The x/s with x,- G Ic (so that x,- = 0 everywhere on C) will not play a role, and we discard them. For the remaining indices we can regard each Xj/x0 as in k(C). If v(xj/xq) > 0 for all j, we use the standard system of affine local coordinates (with $ = 1) and pass to the affine curve C H kn1 which is not empty since x0jfi 0 somewhere on C. The members of the affine coordinate ring k[Cf)kn] are polynomials in the Xj/x0i and thus v is > 0 on the whole ring. Hence k[C C\ kn] C R. On the other hand, if v(xj/xq) < 0 for some j, let j = jo be an index for which it is smallest. Using affine local coordinates in which $ interchanges 0 and jo, we readily see similarly that CC\kn is nonempty and that k[C n kn] C R. Consequently / = Mr H k[C fl kn] is a proper ideal in £[C fl ibn] = fc[ibn]//c- Let / be the inverse image of I in k[kn}. By Theorem 11.37, all members of I vanish at some x in kn. Since Ic C /, x is in C. Thus every member f of I has the property that ordr(/) > 0. Since C Dkn is nonempty, we can identify k(C) with k(CDkn) and ifc[C]x with k[C n Jkn]x. Suppose F isin MR n *[C n ikn]x. We can write F = f fg with / and g in £[C fl /kn] and oxdx(g) = 0. Since g is in fc[C fl kn] C 72, / = Fg is in 7; thus the previous paragraph gives °rdar(/) > 0. Since ordx(^) = 0, we have ordx(F) > 0. This proves that (t/(F) > 0 and ordr(F) > 0) => ordr(F) > 0 (11.71) for general F in k(C). The contrapositive of (11.71) is ordx(F) = 0 ==> «(F) < 0. Applying this to 1/F, we obtain ordx(F) = 0 => v{F) = 0. (11.72) By Proposition 11.39 we can choose t in ife[CnJbn] that is a uniformizing parameter at x_. Since k[C fl kn] C R, we have v(t) > 0. A general member F of Jb(C) is of the form F = tlF0 with ordx(F0) = 0. By (11.72) this Fhas v(F) = lv(t) + v(Fo) = lv{t) = v(<)ordr(F). Now t; is not identically 0 since R is a proper subring of k(C). Since v(0 > 0) we conclude that v(t) > 0 and that v is a positive multiple of ordr(). This completes the proof.
8. CANONICAL MODEL OF X0(N) 353 Proposition 11.43 gives a clue how to reconstruct a nonsingular projective curve from its function field. The point is that the same idea can be used to associate a nonsingular projective curve to any function field of dimension 1. Theorem 11.44. Let K be a function field of dimension 1 over fc, and let Ck be the set of discrete valuation rings of K over k. Then there exists a nonsingular projective curve C such that k(C) = K and such that Ck gets canonically identified with the set of points of C. REMARK. The curve C is unique up to isomorphism, by Theorem 11.48 below. We shall give a brief indication of some of the steps. First, if B is any integral domain that is finitely generated as a k algebra, then B is isomorphic with the coordinate ring of an affine variety. In fact, let xi,...,xn be generators of B over k. The map Xj —► Xj gives a k homomorphism of k[X\,..., Xn] onto B. Since B is an integral domain, the kernel is a prime ideal J. Then Vj is the required affine variety. Now let us consider the function field K of Theorem 11.44. Let 3 be a member of K not in k. We form the polynomial ring k[x]. Then K is a finite extension of the quotient field k(x). Let B be the integral closure of k[x] in K. When the construction of the previous paragraph is applied to this B, the resulting affine variety turns out to be a nonsingular affine curve. At this stage we bring in Ck- Let R be a discrete valuation ring of K over k, and choose x to be in R (but not in k). Then k[x] C R. One can show that R is integrally closed in Ky and it follows that B C R. The intersection N = Mr f] B is a maximal ideal of B and thus corresponds to a unique point of the nonsingular affine curve built from B. In other words, R has been attached to a unique point of the curve corresponding to B. To prove Theorem 11.44, one has to take the product of the projective closures of finitely many such affine curves, map Ck into this product diagonally, and take the image of Ck as the desired curve C. We omit the details. For our application to Xo(N) it will be important to know conditions under which the curve C in Theorem 11.44 is defined over the subfield kQ. We assume now that k = C. Taking a cue from Theorem 11.36, suppose K — C(f,g) and Kq = k0(f,g). Given R in the argument above, we can always choose x to be either / or 1//, so that x is in Kq. So we need to deal only with two jB's. We are allowed to use more #'s, but we insist that they come from x's in Kq. The heart of the matter
354 XI: EICHLER-SHIMURA THEORY is to get the affine variety determined by B to be defined over ko. This step we address in the following lemma. Lemma 11.45. Let K = C(f,g) be a function field of dimension 1 over C, let k0 be a subfield of C, put Ko — k0(f}g)i and suppose that C fl Ko = ibo- Let x be in I<o but not to, let B be the integral closure of C[x] in K, and let V be a nonsingular affine curve determined by B. Then V is canonically defined over ko. Under this definition, ko(V)S*KQzn<iC(V) = K. Proof. Without loss of generality, / is transcendental over C, and g is algebraic over C(/). Taking into account (c) => (a) in Theorem 11.36, we see that / is transcendental over ko and g is algebraic over fco(/)- Therefore Ko has transcendence degree 1 over ko- Since C fl Kq = Aro, the given element x of Ko is not in C, hence is transcendental over ibo- Consequently {x} is a transcendence basis of A'o over ko. It follows that / and g are algebraic over k0(x) and that K0 is a finite algebraic extension of ibo(x). By the Theorem of the Primitive Element, there exists y in A'o such that Ko = ko(x,y). Then K = C(x,y). Condition (c) in Theorem 11.36 is independent of the two generators and hence still holds when we replace (f,g) by (x,y). By (c) => (a) in the theorem, we see that [Ko:k0{x)] = [K:C(z)]. (11.73) Let n be the common value of the two sides of (11.73). Since C[x] is a principal ideal domain, there exists a basis {xj,..., xn) of K over C(x) consisting of members of B such that n fl = £CM*i. (11.74) We take X\,..., xn, x as generators of B over C. Then the map <PB ••C[Ar1,...1ArfX+i] —► B given by Xj —> Xj for j < n and Xn+i —► x exhibits B as the affine coordinate ring of the curve V defined by the ideal Is = ker <ps- We are to prove that IB is finitely generated as an ideal by elements of k0[X\,... ,Xn+i], and we are to identify the function fields.
8. CANONICAL MODEL OF X0(N) 355 Let Bq be the integral closure of ko[x] in A'o. As above, there exists a basis {y\,..., yn) of K0 over k0(x) consisting of members of B0 such that n flo = 5Z*oWw. (1175) We form the map <Pb0 • k0[X\y... ,Xn+i] —► Bo given by Xj -* j/, for j < n and Xn+i —> x. Let 7a0 = ker <pb0. By the Hilbert Basis Theorem, Ib0 is finitely generated, say by members Pi,..., Pyv of Ar0[A"i,..., Xn+i]. We complexify the exact sequence 0 —► Ib0 —► ko[Xi, • •., A"n+i] —► Bo —► 0 to a sequence 0 — IBo ®k0 C — C[Xl9..., X„+1] —> £0 <8>*0 C — 0. The result is exact since the operation (•) <g>jb0 C is an exact functor on ibo vector spaces. Here Ib0 <8>k0 C is generated by Pi,..., Pyv, and n ; = i Let <p%Q denote the complexified version of y>B0- The elements y; are linearly independent over C(x) by (a) of Theorem 11.36. Hence both {xj} and {yy} are bases of K over C(x). Let xp be the C(x) linear map of K into itself given by Xj —+ yj for 1 < j < n. Then it is clear that X\) o (pB = <p%Q as C linear maps. Therefore their kernels are equal. But \l> is one-one. Hence lB = lB0®k0C. (11.76) The right side is generated by Pi,..., P^v, which are in ko[Xi,..., Xn+i]. Thus Ib is finitely generated as an ideal by elements of ko[X\,..., Xn+i]. The affine coordinate ring of V is #, and the function field is the field of quotients of B, which is A' by (11.74). In the case of V/fro, we
356 XI: EICHLER-SHIMURA THEORY determine the affine coordinate ring by first determining /(V/Jfeo), which is given by i(v/k0) = i(V)nk0[xu...,xn+l] = Ib rik0[Xu...,Xn+i] = (/Bo®*oc)nt0[^i.-...^n+i] by (n.76) By definition k0[V] = k0[X\y... ,Xn+i]//B0, and this is B0. The function field is the field of quotients of Bo, which is A'0 by (11.75). Theorem 11.46. Let t0 be a subfield of C. If C/k0 is a nonsingular projective curve, then there exist elements / and g in ko(C) such that k0(C) = ko(f,g) and C(C) = C(f,g). Moreover, Cnk0(C) = t0. Conversely let K = C(f,g)) be a function field of dimension 1 over C, let to be a subfield of C, put A'o = to(/,</), and suppose that C PI Ko = to. Then there exists a nonsingular projective curve C/to such that t0(C) ~ K0 and C(C) = K. The question of uniqueness of C in Theorem 11.46 is conveniently addressed in the context of a correspondence between certain maps of function fields and certain morphisms between curves. Recall from the end of §7 that a dominant morphism F : C\ —► Ci between nonsingular projective curves (over t) induces a one-one t algebra homomorphism F* : t(C2) —► t(Ci) by composition. F*(f) = f o F. Lemma 11.47. A morphism F : C\ —* C? between nonsingular projective curves over t either is onto or is a constant map. Theorem 11.48. Let Ci/to and C2Jko be nonsingular projective curves, and let F : C\ —► Ci be a dominant morphism defined over to. Then F* carries to(C?2) into to(Ci), and t0(Ci) is a finite extension of F*(t0(C2)). Conversely if/* : t0(C2) —»- t0(Ci) is a one-one t0 algebra homomorphism (carrying 1 to 1), then there exists a unique dominant morphism F : C\ —► C2 defined over to such that F* = T. In particular the nonsingular projective curve given by Theorem 11.46 is unique up to isomorphism. Finally we can put into place the canonical Q structure on the compact Riemann surface Xq(N).
8. CANONICAL MODEL OF X0(N) 357 Corollary 11.49. There exist a nonsingular projective curve C/Q and a biholomorphic mapping <p : X0(N) —► C(C) such that <p*(C(C)) = K{X0(N)) = C(jJN) and ^(Q(C))=Q(j,jn). The curve C/Q is unique up to isomorphism defined over Q, and <p is uniquely determined by the isomorphism of Q(C) with QCm'at). Remark. One says that (^>,C) is a model for Xq(N) over Q. Our practice is to identify C(C) with X0(N) and refer to X0(^)/Q as a Q structure on Xo(N). Proof. Theorem 11.33 gives K(X0(N)) = C(jJN). Theorem 11.32 shows that C(j,Jn) and Q(i,i;v) satisfy (a) of Theorem 11.36 and hence the other equivalent conditions. Theorem 11.46 produces the desired curve C/Q, and Theorem 11.21 produces (p. Finally Theorem 11.48 proves the uniqueness of C and (p. Our occasional observations in §7 about Autjfc0(fc) apply with fc0 = Q and ib = C since the subfield of C fixed by AutQ(C) is Q. This fact allows us to give a reasonable way of deciding what members of K(Xo(N)) are in Q(JJn)- Corollary 11.50. A member of C(j,jn) is in Q(j,jx) if and only if its q expansion at oo has all coefficients in Q. Proof. Any element ip of AutQ(C) acts on C(jJs) and fixes QUiJn) because these fields are associated with varieties. Let f(r)= £ cnqn be in C(j,js). If / is written as a rational function in .; and jw, then f* is the same rational function but with each complex coefficient a replaced by (p(a). The claim is that oo r(r)= £ *>(c„)<f, (11.77) and then the corollary will follow because Corollary 11.49 ensures that the subfield of C(jy js) fixed by AutQ(C) is just Q(JjJn)-
358 XI: EICHLER-SHIMURA THEORY By Theorem 11.32, 1=0 Since the coefficients of the q expansion of jn are integers, it is enough to prove (11.77) for / in C(j) = C(j~*). Since the coefficients of the q expansion of j are integers, it is enough to handle where P is a polynomial without constant term. We have / = i-fP(r1) + p(i"1)2 + ..., and the coefficients of 1, q, g2,..., qM are the same for / as for i+/>(r1) + --- + ^(r1)M- Similarly they are the same for f* as for i + p*(r1) + --+F<>(r1)A'. Then (11.77) follows. Example. If M divides AT, then jM is in K(X0(N)) = C(j, j^). Since the q expansion of jm has rational coefficients, j\f is in Q(j, jV). The inclusion To(N) C To(N/M) induces a holomorphic mapping F of X0(N) onto X0(N/M) whose pullback on K(X0(N/M)) is given by F*(f)(r) = f(Mr). Thus F*(j) = jM and F*{jN/M) = jNy and ^(QU,iN/A/))CQ(j,jN). (11.78) Comparison of Theorem 11.21 and Corollary 11.49 shows that the holomorphic mapping F can be regarded as a morphism between complex varieties. Bringing in (11.78) and looking at Corollary 11.49 again, we see that F is defined over Q. We mention one further corollary of our results. The proof is in the same spirit as the above results and will be omitted. Corollary 11.51. Let ko be a subfield of C, and let C/k0 be a nonsin- gular projective curve. If A'i is a subfield of kQ(C) of finite index containing fc0, then there exist a nonsingular projective curve C/ko and a dominant morphism F : C —► C defined over k0 such that F*(kQ(C')) = K\. The curve C" is unique up to isomorphism defined over k0, and F is uniquely determined by the isomorphism of k0(C) with K\.
9. ABSTRACT ELLIPTIC CURVES AND ISOGENIES 359 9. Abstract Elliptic Curves and Isogenics The Eichler-Shimura theory does not work directly with nonsingular Weierstrass equations. Instead it works with a certain kind of projective curve that turns out to be isomorphic to a curve defined by a nonsingular Weierstrass equation. In this section we shall expand our definition of elliptic curve to include this wider kind of projective curve. We omit proofs of general results about curves but usually give sketches of proofs about elliptic curves. Let ko C k be fields with the same assumptions as in §7. Let C\/ko and C2/ko be nonsingular projective curves, and let F : C\ —► Ci be a morphism. If F = 0, we define the degree of F by degF = 0. Otherwise F is onto, by Lemma 11.47, and F*(ko(C2)) has finite index in Jb0(Ci), by Theorem 11.48. In this case we define degF = [k0(d) : F*(k0(C2))]. (11.79) This degree is the same if computed over k. For the case of characteristic ^ 0, one must pay attention to separability of field extensions in the present context. Any algebraic extension can be obtained in two steps, a separable one followed by a purely inseparable one. For the field extension indicated in (11.79), we let deg, F be the degree of the separable part and deg, F be the degree of the purely inseparable part. Then degF is the product of dega F and deg, F. We say that F is separable or purely inseparable if deg F = deg, F or degF = deg, F, respectively. Recall that the quotient map H —► To(N)\H of Riemann surfaces fails to be a covering map at elliptic fixed points of Tq(N) in 7i. We shall define the counterpart of this kind of ramification for varieties. For F as above but nonconstant, let x be in C\ and let If(x) ^e a uniformizing parameter at F(x). We define eF(x) = ordx(F*(tF^)). The morphism F is unramified at x if eF(x) = 1, ramified if eF(x) > 1. It is unramified if it is unramified at every point. One can prove for every y in C2 that £ eF(x) = degF\ (11.80) *€F-»(V) also for all but finitely many y in C2) #F-1(t/) = deg5F (11.81) It follows immediately from (11.80) that F is unramified if and only if #F-X(y) = degF for all y in C2.
360 XI: EICHLER-SHIMURA THEORY Suppose C is defined over Zp with 1(C) generated by some homogeneous polynomials / in Zp[Xi,... ,X„]. The Frobenius morphism <)> of C is given by *[(x0,...9xn)] = [(J0,...tJn)]. (11.82) Since /(*oi---i*n) = (/(xo, • • • ,**))*, <£ really does map C to C and is defined over Zp. More generally let char(^o) = P and q = pr. If C is defined over fc0 with 7(C) generated by some homogeneous polynomials / in ko[Xoy... ,Xn], we define C^ by the ideal generated by all f(q\ where fM is / with all coefficients raised to the qth power. The qth power FVobenius morphism <j> carries C to C^ and is given by (11.82) with p replaced by q. The situation in the previous paragraph is a special case because C(p> = C when C is defined over Z«. One can prove that (a) *-(*0(C<»>)) = {/« | / € C] (b) <j> is purely inseparable (c) deg<£ = g. In characteristic p, our original nonconstant morphism F : C\ —♦ C*2 admits a factorization F = Fs o <j>, where <f> : C\ -+ C[q) is the gth power Frobenius morphism with <j = deg^ F and where F, : C[9) —+ Ci is separable. We can define the abelian group Div(C) of divisors of a nonsingular projective curve C just as we did for a compact Riemann surface in §4. In the case of / ^ 0 in fc(C), we say / has a zero at x if ordx(/) > 0 or a pole if ordx(/) < 0. The divisor [/] of / makes sense because of the following theorem. Theorem 11.52. If C is a nonsingular projective curve and / is in Jb(C), then / has only finitely many zeros and finitely many poles. The divisor of a function is called a principal divisor. The degree of a divisor £r ordx(D)z is the integer J^. ordx(Z>). Theorem 11.53. Every principal divisor on a nonsingular projective curve has degree 0. If F : C\ —► C? is a dominant morphism between nonsingular projective curves, then F induces maps on divisors in both directions by F.rDMGJ-DKC,) F* ■ Div(C2) - °iv(Cl) x->F(x) V~* H ef(x)x- (1183) *e*->(y)
9. ABSTRACT ELLIPTIC CURVES AND ISOGENIES 361 The idea of the map F* was already implicit in our discussion centered about (11.29). The theory of divisors on C proceeds as in the case of compact Riemann surfaces (§4). The set of principal divisors is denoted Div0(C), and the quotient Pic(C) = Div(C)/Div0(C) is called the divisor class group. Let Pic°(C) be the quotient of the divisors of degree 0 by the principal divisors. Our curve C has an analog of meromorphic differentials on Riemann surfaces. These always exist, and the divisor class of such differentials is called the canonical class. Linear systems L(D) for divisors are defined just as in §4. If D < 0, then L(D) = 0 as a consequence of Theorem 11.53. By an analog of Proposition 11.7, L(0) is 1-dimensional. If fr = C, then the C points of C form a compact Riemann surface, and genus has the customary topological meaning. One can give an abstract definition of genus even in characteristic ^ 0, but for our purposes we may as well define the genus g by its value from the Riemann-Roch Theorem with D = 0: g = dimL(W) for any W ^ 0 in the canonical class. Theorem 11.54 (Riemann-Roch Theorem). Let C be a nonsingular projective curve of genus g, let D be a divisor, and let W be in the canonical class. Then dimL(D) = deg D + dim L(W - D) - g + 1. Corollary 11.55. Let C be a nonsingular projective curve of genus g. If D is a divisor with deg£) > 1g — 2, then dim L(D) = deg D - g + 1. The Jacobian variety Pic°(C) will be identified as a variety for the genus 1 case in Corollary 11.60, and for the higher genus case in §10. If C is defined over fco, the group Autjb0(ik) acts on points of C, fixing exactly those in C(Jbo)- The action on points induces an action on divisors. We say that a divisor D is defined over fro if it is fixed by Autjt0(£). (It is not necessary for each of the points with nonzero coefficient in D to be fixed by Autjk0(fc).) The following result is fundamental. Proposition 11.56. Let C/ko be a nonsingular projective curve and let Dbea divisor defined over k$. Then the k vector space L(D) has a basis defined over fro (i.e., consisting of members of fro(C)).
362 XI: EICHLER-SHIMURA THEORY We come to our expanded definition of elliptic curve. An elliptic curve is a pair (E,0)t where E is a nonsingular projective curve of genus 1 and O is a point in E. The elliptic curve is defined over ko if E is defined over k0 and O is in E(ko). If a curve E is given by a nonsingular Weierstrass equation (3.23) over k0 and if O is taken as the usual point at oo, then (E,0) is an elliptic curve in the new sense, and it is defined over k0. To see this, we need only compute the genus. The expression (11.21) is a member of the canonical class with neither zeros nor poles, hence with divisor W of degree 0. Then Theorem 11.54 with D = W gives g = dim L(W) = deg W + dim L(0) -0 + 1 = 0+1-0+1. Hence g = 1. The converse is as follows. Theorem 11.57. If (E,0) is an elliptic curve defined over &o> then there exist functions x and y in ko(E) such that the map F : E —► /^(fc) with F = [(x,y,l)) has F(0) = [(0,1,0)] and is an isomorphism defined over &o onto a non- singular curve given by a Weierstrass equation (3.23) whose coefficients are in k0. Some comments about the proof may be helpful. For n > 1, L(n(0)) has dimension n, by Corollary 11.55, and we may choose a basis for it from k0(E) by Proposition 11.56. We take any x so that {l,x} is a basis over ko of L(2(0)) and then any y so that {l,x,y} is a basis over ko of L(3(0)). The seven functions l,a:,y,x2,xy,y2,x3 are in L(6(0)), which has dimension 6, and must be linearly dependent. A nontrivial linear relation among the seven functions can be adjusted to give the desired Weierstrass equation. (This argument can be implemented for specific equations. For example, the change of variables that leads from (3.18) to (3.19) and (3.20) can be deduced from it after (3.18) is desingularized at oo.) To show that the Weierstrass equation obtained in the proof of Theorem 11.57 is nonsingular, one proves first that x and y generate k(E) over k. Then it follows that F has degree 1. If the image is singular, Proposition 3.11 allows us to compose F with a certain map to P\(k) such that the composition has degree 1. Since E and Pi (it) are both nonsingular, Theorem 11.48 shows they are isomorphic. But they have different genus, contradiction. The same kind of argument classifies isomorphisms between elliptic curves that are given by Weierstrass equations.
9. ABSTRACT ELLIPTIC CURVES AND ISOGENIES 363 Proposition 11.58. If two elliptic curves (E, O) and (£', O') defined over k0 are given by Weierstrass equations over Jk0 (with 0 and O' corresponding to oo), and if there exists an isomorphism F : E —► E' defined over ko with F(oo) — oo, then E and E' are related by an admissible change of variables (3.43) with coefficients in k0. The idea of the proof is as follows: If E «-► (x,y) and E' <-► (x\y')y then {l,x} and {l,x'} are bases of L(2(0)) and L(2(0'))y while {l,x,y} and {l,x'ty'} are bases of L(3(0)) and L(3(0'))- As a consequence of the above results, we can transfer the group operation to a general elliptic curve from any of its associated Weierstrass equations. We need to know that this group operation fits in with the kinds of mappings we have been discussing, and we need to know how to recognize the group operation without passing back and forth to a Weierstrass equation. Proposition 11.59. (a) For an elliptic curve over ko given by a Weierstrass equation over k0y addition and negative are morphisms defined over k0. (b) For an elliptic curve (EyO), a definition of addition that is consistent with the definition in Weierstrass form is as follows: If D is any divisor of degree 0, there exists a unique point z on E such that D — (z — (O)) is a principal divisor. The sum of x and y is the point corresponding to the divisor x + y — 2(0), the identity is (O), and the negative of x is the point corresponding to the divisor (O) — x. Part (a) is tedious to check. Part (b) follows by repeated use of Theorem 11.54 and little else. Corollary 11.60. For an elliptic curve (£,0), the mapping x —► x—x{0) of E into Div(£) yields a group isomorphism of E onto Pic°(/£). The corollary is just a reworded version of Proposition 11.59b. The Riemann surface analog is Corollary 11.16a. From now on, we shall typically say, "Let E be an elliptic curve," dropping reference to the group identity. The identity will be denoted O and for Weierstrass equations will be the point at oo. If Ei and E2 are elliptic curves over k, an isogeny between them is a morphism F : E\ —► Ei with F(0) = O. All nonzero isogenics are onto, by Lemma 11.47. When k = C, we saw in Chapter VI that isogenics are automatically homomorphisms. The same thing is true in general.
364 XI: EICHLER-SHIMURA THEORY Proposition 11.61. An isogeny is a group homomorphism. Sketch of proof. We may assume that the isogeny F : E\ —* E2 is not zero. Referring to (11.83) and Corollary 11.60, we write F as the composition of three homomorphisms E\ ~ Pic°(£i), Fm : Pic°(Ei) —► Pic°(£2), and Pic°(£2) ^ E2. Examples. 1) If E is any elliptic curve, then multiplication by the integer my denoted [m], is an isogeny from E to E. 2) In §1 we wrote down an explicit isogeny F : Ef —► E between two elliptic curves defined over Q such that j(E') — —163/11 and j(E) = -212313/115. 3) If E is an elliptic curve over Zp, then the Frobenius map <f>: E —► E given in (11.82) is an isogeny. If E\ and E2 are elliptic curves defined over k, we let Hom(E\yE2) be the set of all isogenies from E\ to E2. The pointwise sum F + G of two members of Hom(E1,£,2) is again in Hom(Ei,£,2), Deing the composition of diagonal : E\ —► E\ x E\, F x G : Ex x Ex — E2 x £2, and 4- • E2 x i£2 —► E2. Then Hom(i?i,2?2) becomes an abelian group. The subset of isogenies defined over fco is a subgroup. We write End(E) in place of Hom(£*, E) when the two curves are the same. The abelian group End(£) has a multiplication given by composition. The distributive law F(G + H) = FG + FH follows immediately by applying Proposition 11.61 to F. Let E be an elliptic curve. For x in E> we define a translation map Tx : E — E by Tx(y) = i + y. Then Tx* is an automorphism of i(E)\ if E is defined over k0 and x is in E(k0), then Tx" is an automorphism of*o(£). Proposition 11.62. If F : E\ _—► £2 is a nonconstant isogeny between elliptic curves defined over Jb, then ker F is finite, and the map x —► Tr* induces an isomorphism kerFSAut/..(j(Bj)nt(£1)). (11.84) If F is separable, then F is unramified and degF = #ker(F); (11.85) moreover, fc(Fj) is a Galois extension of F*(/fc(F2)).
9. ABSTRACT ELLIPTIC CURVES AND ISOGENIES 365 The proof uses (11.80) and (11.81), as well as a little Galois theory. We omit the details. For an elliptic curve over Zp with k — Zp, Proposition 11.62 gives a way of calculating #F(ZP). In fact, the Frobenius morphism <j> of (11.82) fixes exactly those points of E that are in E(2p). Thus x € F(Zp) <=* 4>(x) = x <=► ([1] - 4>){x) = O <=> x e ker([l] - <t>) and #Z?(Zp) = #ker([l]-<*). (11.86) A fundamental fact, which we shall not prove, is that 1 — <f> is a separable isogeny. (11.87) Combining this fact with (11.85) and (11.86), we obtain #£(ZP) = deg([l] - *). (11.88) Lemma 11.63. Let F : Ei —► E2 and G : E\ —* F3 be nonconstant isogenics of elliptic curves over k. If F is separable and kerF C kerG, then there exists a unique isogeny H : E2 —► E3 such that G = H o F. If F and G are defined over k0y so is H. Proof. The inclusion of kernels and (11.84) give natural isomorphisms Autj,.(i(isa))(t(JEi)) ~ ker F C kerG S AutG.(£(fi3))(i(£7i)). Hence every automorphism fixing F*(k(E2)) fixes G*(fc(F3)). By Proposition 11.62, k(Ei) is Galois over F*(ifc(F2)). Hence G*(k(E3)) C F*(k(E2)). Then Theorem 11.48 gives us a corresponding morphism i/ : F2 — F3 with F* o H* = G*. By uniqueness in Theorem 11.48, G= H oF. Then O = G(O) = H(F(0)) = H(0) shows that // is an isogeny. If F and G are defined over k0y we can apply Theorem 11.48 to G*(ko(E3)) C F*(k0(E2)) to construct // as defined over fc0. Theorem 11.64. Let F : £1 —> £2 be a nonconstant isogeny of degree m between elliptic curves defined over k. Then there exists a unique isogeny F : E2 —> E\ such that F o F = [m]. If F is defined over fr0, so is F. REMARKS. F is called the dual isogeny to F. We say that E\ is isogenous to E2 if there is a nonconstant isogeny from E\ to E2. As a consequence of this theorem, "is isogenous to" is an equivalence relation. If we restrict attention to elliptic curves defined over Jb0, it is still an equivalence relation.
366 XI: EICHLER-SHIMURA THEORY SKETCH OF PROOF. As a morphism, F factors into F = Fs o(f>, where F8 is separable and <j> is a qth power Frobenius. Thus it is enough to treat Fs and <j> individually. We treat only FS) changing notation and writing F for it. By (11.85), #ker(F) = m. Thus the map [m] : Ei -+ Ex annihilates kerF, and kerF C ker[m]. Application of Lemma 11.63 completes the proof. Dual isogenics have the following properties, whose proofs we omit. Proposition 11.65. Let F : E\ —► E2 be an isogeny of degree m. (a) F o F = [m] on Ely and F o F = [m] on E2. (b) deg F = deg F and (Pf = F. (c) [m]"= [m] and deg [m] = m2. (d) If G : F2 -► E3 is an isogeny, then (G o F)~= F o G. (e) UH:El->E2 is an isogeny, then (F + H)~= F + H. The next result is a converse to Proposition 11.62. Theorem 11.66. Let E be an elliptic curve over £, and let S be a finite subgroup of E. Then there exist an elliptic curve E1 and a separable isogeny F : F —► E1 such that kerF = 5. The curve F' is unique up to isomorphism. If E is defined over k0 and if 5 is stable under Autjb0(fc), then Ef is defined over fro and may be taken to be defined over fro- Idea of proof over fr if char(fr) = 0. The set {Tx* \ x e 5} is a finite group of automorphisms of fc(F). We apply Lemma 11.35 to determine a subfield, Theorem 11.44 to define a nonsingular projective curve C, and Theorem 11.48 to define a dominant morphism F : E —► C. One has to prove that F is unramified and then that the genus of C is one. Let us return to elliptic curves over Q. The L function of such a curve was defined in (10.9). Theorem 11.67. Let E and E' be elliptic curves over Q that are isogenous over Q. Then L(syE) = L(s,E'). SKETCH of proof. The two L functions will be equal factor by factor. We treat the pth factor only for primes p not dividing A or A', omitting the others. If F : E —► E' is a nonconstant isogeny defined over Q, one has to check that F induces a nonconstant isogeny Fp : Ep —► Ep
10. ABELIAN VARIETIES AND JACOBIAN VARIETY 367 defined over Zp. Taking this for granted, let <j>\ and $2 be the Frobenius morphisms of Ep and Ep defined in (11.82). We can write Fp = Fs<)>\ with F9 : Ep —► E'p separable and defined over 2p. So we may as well assume from the outset that Fp : Ep —► Ep is a separable isogeny defined over Zp. If we write out the morphism Fp as [(Fo, i^i, -^2)] 1 we nave = [(i:o(x)^F1(*)^F2(*)',)] = MFp(x)). Thus Fpo^=^2oF,. (11.89) If y is in E'p, then y = Fp(x) for deg Fp values of r, by (11.85) since ^i is onto. For such an x, yeE'p(zp) <^=> My) = y <=* </>2Fp(x) = Fp(x) ^=> iGker(([l]-^2)Fp). Thus #W} " d^Fp = #ker(Fp([l]-^)) deg Fp by (11.89) = deg(F,([l]-^)) by(n87)and(11.85) degFp = deg(Fp)deg([l]-^) degFp = deg([l]-<Ai) = *EP(2P). Hence the pth factors of L(sy E) and L($} E') match. 10. Abelian Varieties and Jacobian Variety In this section our algebraically closed field k will be C, and the sub- field k0 will be Q. An abelian variety A is a nonsingular projective variety over C with a distinguished point O and with an abelian group
368 XI: EICHLER-SHIMURA THEORY structure such that O is the identity and the operations of addition and negative are morphisms. The abelian variety is said to be defined over Q if A is defined over Q, O is in j4(Q), and addition and negative are defined over Q. Any elliptic curve (over C or Q) is an example, by Proposition 11.59a. Conversely an abelian variety of dimension 1 is an elliptic curve. (We have only to see that the genus is 1. We can work with j4(C), which is a compact Riemann surface. TVanslations are fixed- point free automorphisms homotopic to the identity, and the Lefschetz Fixed-Point Theorem allows these only in genus 1.) The goal of this section is to introduce the Jacobian variety J(C) of a nonsingular projective curve C defined over Q. J(C) is to be an abelian variety defined over Q and is to play the same role in geometry over Q that the Jacobian variety of §4 plays in the theory of compact Riemann surfaces. Let A and B be abelian varieties. A homomorphism F : A —> B is a morphism of varieties that is also a homomorphism of groups. We say that F is an isomorphism if it is an isomorphism of varieties and an isomorphism of groups. If A and B are defined over Q, we say F is defined over Q if it is defined over Q as a morphism. The set Hom(j4,£) of homomorphisms from A to B is an abelian group under pointwise addition. For the special case that A = 5, we write End A in place of Hom(,4,A). This is a ring with composition as multiplication. If A is an abelian variety and C is a subvariety (obtained by enlarging the defining homogeneous ideal of polynomials), then the inclusion map is automatically a morphism. If C has a group structure compatible with that of A, then inclusion and addition give morphisms CxC —>AxA—+ A with image in C that show that addition on C is a morphism. Similarly negative on C is a morphism. We say that C is an abelian subvariety of A. The kernel of a homomorphism between abelian varieties is an example if it is connected in the complex manifold topology. We shall make use of the following two propositions. Proposition 11.68. If F : B —► A is a homomorphism between abelian varieties, then C = image F is an abelian subvariety of A. If A, Bt and F are defined over Q, then C may be taken to be defined over Q Proposition 11.69. Let A be an abelian variety, and let C be an abelian subvariety. Then A/C may be given the structure of an abelian variety in the following sense: There exists a pair (A\ F) such that A' is an abelian variety, F : A —► A' is a homomorphism onto, ker F =
10. ABELIAN VARIETIES AND JACOBIAN VARIETY 369 C, and any homomorphism F" : A —> A!' of abelian varieties with ker F" D C factors through F, i.e., F" = F' o F for a homomorphism F' : A' —► A" of abelian varieties. The pair (A',F) is unique up to canonical isomorphism. If A and C are defined over Q, then A' and F will be defined over Q; in this case, when A" and F" in the universal property are defined over Q, F' may be taken to be defined over Q. The next proposition gives some properties of abelian varieties and homomorphisms in order to shed some light on the definitions. Although the proposition helps to motivate parts of §11 and Chapter XII, it will not be used explicitly. Proposition 11.70. Let A and B be abelian varieties. (a) A homomorphism F : A —► B is onto if and only if its kernel is finite. (In this case, A and B are said to be isogenous.) (b) Isogeny is symmetric and hence is an equivalence relation. (c) If V is a nonsingular projective variety and F : V —* A is a rational map, then F is a morphism. (d) Any morphism F : A —► B is the composition of a homomorphism and a translation. (e) If 5 is an abelian subvariety of Ay then there exists an abelian subvariety C of A such that B n C is a finite group and A = B + C as groups. If A and B are defined over Q, then C may be taken to be defined over Q. (f) A is isogenous to a product of abelian varieties that are simple (in the sense of not having any proper nonzero abelian subvarieties), and the factors of the decomposition are unique up to isogeny. For X a compact Riemann surface of genus </, we defined the Jacobian variety J(X) = C9/A(X) in §4. With a base point xQ fixed in X, there is a canonical holomorphic mapping $ : X —* J(X) with the following universal property. Whenever F : X —► T is a holomorphic mapping into a complex torus, then F factors through the Jacobian variety: F = fo$ + F{x0) for some holomorphic homomorphism / : J(X) —► T. When the base point in X is changed, * gets composed with a translation. The universal property makes J(X) unique up to a biholomorphic group isomorphism. If J(X) is fixed, $ is unique up to composition with a translation and a biholomorphic automorphism of J(X).
370 XI: EICHLER-SHIMURA THEORY If C is a nonsingular projective curve over C, then C(C) has an underlying complex manifold structure, hence is a compact Riemann surface. Thus C has a Jacobian variety in the above sense. The following theorem solves the problem of putting on the Jacobian variety the structure of a nonsingular projective variety. Theorem 11.71. Let C be a nonsingular projective curve over C, let J(C) be the Jacobian variety of the underlying compact Riemann surface, and let $ : C —► J(C) be the canonical holomorphic mapping with base point x0. Then J(C) admits the structure of a nonsingular projective variety in such a way that (a) its group structure makes it into an abelian variety, (b) $ is a morphism, and (c) whenever F : C —► A is a morphism into an abelian variety, then F factors through J(C) as F = f o <& + F(x0) for some homomorphism / : J(C) —► A of abelian varieties. Moreover, if C is defined over Q, then J(C) can be defined over Q in such a way that (a), (b), and (c) are valid with structures defined over Q There is a uniqueness statement for Theorem 11.71, just as in the setting of compact Riemann surfaces, and it is again easy. Existence, however, is a deep question. Theorem 11.71 is due to Lefschetz for C and to Weil and Chow for Q. The following result will be handy in detecting morphisms that involve J(C). Theorem 11.72 (Chow). If V\ and V2 are nonsingular projective varieties over C and if F : V\ —► V2 is a holomorphic mapping between their underlying complex manifolds, then F is rational over C. We shall apply Theorem 11.71 to C = Xo(N), which is defined over Q. The ring of holomorphic homomorphisms of J(Xq(N)) into itself is the same as the ring Eud(J(Xo(N))) of abelian variety homomorphisms defined over C, as a consequence of Theorem 11.72. Two key steps in the Eichler-Shimura theory are to transform the Hecke operators into members of End( J(Xq(N))) and to show that the resulting homomorphisms are defined over Q. We carry out the first of these steps now. For this purpose we fix notation as follows: Let Xq(N) have genus g, and let wi,...,^ be the basis of Ct\)O\(X0(N)) used to define J = J(X0(N)) concretely as &/\(X0(N)). (Recall that C and A(X0(N)) are not affected by changing the Z basis of cycles.) Let ir : H* —► To(N)\7i* be the quotient map and $ : 7i* —► J be the composition
10. ABELIAN VARIETIES AND JACOBIAN VARIETY 371 $ = $ o 7T. If we put fj{r) dr = tt*(u;j), then /i,..., fg is a basis for S2(r0(AO) and * >s given by *(T)={/r/i(0dcF_ (1L90) for any base point r £ ^^(xo). Since J = C3/A(Xq(N)), we may identify the tangent space j to J at the group identity O as the same C9. Its standard basis will be denoted ei,..., etf. The group J is a Lie group, and its Lie algebra is j = C9, column-vector space. Any real analytic homomorphism f of J into itself induces (by passage to the differential df at O) an R linear map of C9 into itself, and distinct homomorphisms yield distinct linear maps. The members of End(J) are holomorphic, and their differentials are consequently C linear maps of j = C* into itself. In this way we get a one-one ring homomorphism of End( J) into the algebra of all g-by-g complex matrices: End(J)—>M(gtC). (11.91) We can use z\,..., zg as coordinates on «/, and the space fihoi(«/) of holomorphic 1-forms is the C linear span of dz\,... ,dzg. The spaces fihoi(«J) and j are in natural duality, the pairing being given by (dzi}€j) = Sij. If we identify j with the space of invariant vector fields on J, we can regard this formula also as a valid pairing at any point of J. For / in End(J), we define an endomorphism Sf of Q^Q\(J) by <(«/)«, w) = (ti,(d/)v) for u E JJhoi(J) and v£). (11.92) Let us check that **{dzi)=fi(T)dr. (11.93) In fact, at any point ri, differentiation of (11.90) at T\ gives a (a \ (fdn)\ V fg(n I Since (dr, —I ) = 1, we obtain (11.93). Since $* maps basis to basis, dr I ti 4>* is a vector space isomorphism. Therefore it makes sense to define V- : S2(T0(N)) - fiho,(J) by **(/«(/)) = /(»■)*• for / € S2(To(N)). (11.94)
372 XI: EICHLER-SHIMURA THEORY Let M(n, N) = [J*=1 r0(W)a,- as in (11.28). In (11.29) and (11.32) we defined consistently a Hecke operator T(n) : X0(N) —► Div(X0(N)) by T(n)(,r(r)) = £>(a.r)). (11.95) K 1=1 The mapping $ : Xq(N) —► J extends by additivity to a map $# : DW(X0(N)) — J, and then we can form the composition T#(n) — $# o T(n) given by T#(n)(*(r)) = £*(x(a,r)) = [ . (11.96) /r0 Equation (11.96) exhibits T#(n) as holomorphic; by Theorem 11.72 it is a morphism of varieties over C. In §11 one of the steps will be to prove that T#(n) is defined over Q. Applying the universal mapping property to T#(n) : Xo(N) —► J (with A = «/), we obtain a member t(n) of End J such that T#(n)(n(r)) = «(n)(*(r)) + T#(n)(7r(r0)) for all r. What this equation says is /£/i<o*\ /E,ro,T/'(Orfc\ «(») = • (11-97) \/;/f(o*/ \E.r0,T/s(orfc/ Since <(n) is additive, we can evaluate (11.97) at T\ and subtract the result from (11.97) to obtain (11.98) To compute the differential <ft(n), we can differentiate the formula for t(n) along any curve extending from the O vector. Differentiating (11.98)
10. ABELIAN VARIETIES AND JACOBIAN VARIETY 373 at r = r\ and then writing r for T\ (since T\ is arbitrary), we find dt(n) : = '£.-/i°[<*.-]2(r)\ /T„(n)/i(r)' .E,/soK]2(r)/ \T2(n)/,(r)y (11-99) for all r. Consequently the matrix of dt(n) is A(T2(n))y in the notation of (11.47). We shall use the reformulation below of this result. Proposition 11.73 (Shimura-Taniyama). For / in S2(To(N))} («(n))(/i(/)) =/i(T2(n)/). (11.100) Proof. Write / = $2 ckfk> and let e\ be a standard basis vector of j. Then we have ((ft(n))(M/)), *i) = (M/), «ft(n)c) by (11.92) = £**(<***, *(n)(«i)> = 5>A(Ta(»))w and (/i(T2(n)/), e,> = 5></*(T8(n)/t), e,) = 5^c*<|i(il(ra(n))fcm/m)I«i) = £ct>l(r2(n))tm<<f*m,e,) ib,m = £c^(T2(n))H. A: The above equations show that the two sides of (11.100) are paired equally with any member of j and therefore must be equal.
374 XI: EICHLER-SHIMURA THEORY 11. Elliptic Curves Constructed from S2(ro(A0) We come to the main theorem of the Eichler-Shimura theory. We continue with the notation X0(N), g, A(X0(N))} J, $, {/i,... ,/^}, r0i j, End J, nho\{J), d and 6, fiy T(n)y T#(n)} t(n), dt(n), and 6t(n) as in the latter part of §10. In order to handle one step in the proof of the main theorem, we shall assume that the basis {/i,...,/$} of S2(Tq(N)) over C has been constructed as in Theorem 11.25, so that the matrices of the Hecke operators T2(n) have integer entries. By (11.99) these integer matrices are the matrices of dt(n). We define EndQ(J) = End(J) <g>z Q. By (11.91), EndQ(7) can be regarded as an algebra over Q of g-by-g complex matrices. Theorem 11.74 (Eichler-Shimura). Let /(r) = £~=i cne2*inr be a newform in S2(To(N)) normalized to have c\ = 1, and suppose that all cn are in Z. Then there exists a pair (E,v) such that (a) E is an elliptic curve defined over Q, and (E,i/) is a quotient of J by an abelian subvariety of J defined over Q, (b) the members t(n) of End( J) leave A stable and act on the quotient E as multiplication by the integers cn, (c) fi(f) is a nonzero multiple of ^*(t4/), where u> is the invariant differential (11.21) of E, (d)if A/ = {♦/(t) = f T° f(C)dC | t € r0(A0}, J To then Aj is a lattice in C, and E is isomorphic to C/Aj over C, (e) the L functions of E and / coincide as Euler products except possibly at finitely many primes. Moreover, properties (a) and (b) characterize A uniquely and therefore determine (E) u) up to isomorphism defined over Q. Remarks. Composing v in (a) with $ : Xq(N) —► J, we obtain a morphism i/o& : Xo{N) -+ E defined over Q. The proof of the theorem does not use that / is a newform, only that / is an eigenfunction of all the Hecke operators X^n). Work of Igusa shows that the Eichler- Shimura argument for (e) is applicable to all primes not dividing N. If / is a newform, then Theorem 12.8 will note that L(E,s) and L(/, s) match exactly.
11. ELLIPTIC CURVES CONSTRUCTED FROM S2(rQ(N)) 375 The proof has a characteristic 0 part and a characteristic p part, the latter to handle (e). We shall give the full characteristic 0 part in this section and shall discuss the characteristic p part in §12. We need the following extension of the standard Wedderburn theorem on semisimple associative algebras. Lemma 11.75 (Wedderburn). Let T be a finite-dimensional associative algebra with identity defined over a field k> and let 7£ be its nilradical (largest two-sided ideal). Then there exists a semisimple subalgebra S of T such that T = S 01Z as vector spaces. Moreover, S is a direct sum of ideals, each of which is a simple algebra isomorphic to a full matrix algebra over a division algebra over k. REMARK. We need the lemma only in the case that T is commutative, and then the simple algebras in S are isomorphic to fields that are finite algebraic extensions of k. In order to get to the body of the proof quickly, we postpone to the end of this section the proof of the next lemma. Lemma 11.76. The members t(n) of End( J) are defined over Q. Proof of Theorem 11.74 except part (e). Let T be the commutative Q subalgebra of Endg(J) generated by all the t(n) in End(J). Since each t(n) may be identified with the g-by-g matrix of its differential dt(n), which has integer entries, T is isomorphic to a subalgebra of M(<j, Q). Therefore T is finite-dimensional over Q. We apply Lemma 11.75 to the algebra T and the field Q, obtaining r = s e n = (fci e • • • e K) e ft, where 1Z is the nilradical and each kj is an ideal of S isomorphic to a finite algebraic extension of Q. By Proposition 11.73, we have (ft(n))(M/)) = *./*(/)• Hence there exists a well defined Q algebra homomorphism p oiT into Q given by the formula p{t{n))-cn. It is clear that p{1Z) ~ 0. Changing the order of the factors k\,..., kr if necessary, we may assume that p(k\) = Q. Since k\ is a field, k\ = Q. Let p1 : Q —* ki be the inverse of p to kXi and define an ideal U of T by u = (jb2e---e4r)eft.
376 XI: EICHLER-SHIMURA THEORY The abelian subvariety A will be the sum of all <*{T) for a in U H End(J). Let us see that A is indeed a subvariety and is defined over Q. The members of End(J) may be viewed via (11.91) as members of M(</, C) that carry A(Xo(N)) into itself. With this identification the members of EndQ(J) are Q linear combinations of these matrices. Let a G U fl End(7) be given. Being in T, a is a polynomial in the <(n)'s with rational coefficients. Hence there is a nonzero integer m such that ma is an integer polynomial combination of the /(n)'s. According to Lemma 11.76, each t(n) is defined over Q. Therefore ma is defined over Q. Since a and ma have the same image in «/, Proposition 11.68 shows that image(a) is an abelian subvariety of J defined over Q. Now suppose we have two abelian subvarieties A\ and A2 of J defined over Q. Their sum A\ -t- A2 is the image of A\ x Ai C J x J —► J under addition. Since A\ x A? and addition are defined over Q, another application of Proposition 11.68 shows that >ii H- A2 is defined over Q. We now iterate this construction. With each iteration, either the dimension goes up (of the abelian subvariety as a connected complex Lie group) or nothing new happens. We conclude that A, as we have defined it, is an abelian subvariety of J defined over Q. By Proposition 11.69 we can form the quotient (E,v) of J by A, and it can be taken to be defined over Q. We shall prove shortly that E has dimension 1, and then (a) will be proved. In the meantime, each 0 G TREnd(J) maps J into A. In fact, if a is in A, write a = J2ak(zk) with a* G U fl End(J) and Zk G J- Since U is an ideal, each /to* is in U fl End(J), and we have 0(a) = £><**(**) G £>**(./) C A. Consequently each 0 G TflEnd(J) maps J into A. Applying this conclusion to f(n), we obtain the first conclusion of (b). Since i(n)(A) C A, we have (ker 1/o T(n)) C keri/. By the universal mapping property of (E, v) in Proposition 11.69, there exists t(n) G End(E) with t(n)ou-uot(n). (11.101) In other words t(n) acts on E as i(n). To calculate what this action is, we observe that t(n) — j/(c„) is in U by construction and p'(cn) — [cn] is in U also, where [c„] denotes multiplication by the integer cn. Hence t(n) — [cn] is in U fl End(J). In other words, t(n) — [cn] passes to the quotient and acts as 0. Thus i(n) = [cn]. This proves (b). To prove that dimE > 0, we are to prove that A ^ J. Let m > 0 be the integer for which kiKm ^ 0 and kx1lm+l = 0, and let 0 be a
11. ELLIPTIC CURVES CONSTRUCTED FROM 52(r0(N)) 377 nonzero member of kiHm. Possibly multiplying (3 by a nonzero integer (as in an argument earlier in the proof), we may assume /? is in End(J), not just EndQ(J). If a is in U, then (3a = 0 (because k\kj = 0 for j > I and HmH = 0). If a is in A, we can write a = YlQk(zk) with Qk € U fl End(J) and zk 6 J. Since /?afc = 0, we have /3(a) = 0. Therefore /? is a nonzero member of End(J) that annihilates A. Hence A± J. We shall now prove (c) and prove that dim E < 1, thereby completing the proof of (a). Let u>' be a nonzero member of fihoKE) (existence by previous paragraph), and let i/* be the pullback mapping i/* : fihoi(^) —* ^hol(«J) induced by v. Since v on the complex manifold level is just a homomorphism of one complex torus onto another, u* is one-one. Applying (•)* to (11.101), we have v* o 6i(n) = 6t(n) o v*. Now i(n) = [cn] implies that Si(n) = cn • 1. Hence «(!»)(!/•(«')) = e„i/>')- If we define /' = /i_1(j/*(w')), then /i(/') = "*(<*>'), and Proposition 11.73 gives M(Ta(n)/') = cn/i(/') and hence T2(n)/' = cnf. (11.102) If dimE > 1, then there exist linearly independent u/ and cj" in fihoi(-E), and ^*(u/) and */*(u/') will be independent. If we put /" = iTl{v*{u>")), then the above argument shows that T2(n)f" = cnf". (11.103) Since /' and /" are independent, (11.102) and (11.103) together contradict Proposition 9.20b. Thus dimi? = 1, and the proof of (a) is complete. We can take w1 to be the invariant differential (11.21) of E in the above argument, and (11.102) and Proposition 9.20b force /' and / to be linearly dependent. This proves (c). Let us prove uniqueness. Suppose A and (E',i/') satisfy conclusions (a) and (b). Let u/ and u> be the invariant differentials (11.21) of £" and Ey respectively. Then the argument in the previous paragraph shows that i/*(u/) is a multiple of/i(/). Since we already know that v*(u>) is a multiple of /i(/), ^'*(u/) and v*(w) are multiples of each other. Hence
378 XI: EICHLER-SHIMURA THEORY i/*(u/) and ^*(w) annihilate the same members of j. The respective annihilators are the tangent spaces in j of ker j/ = A1 and keri/ = A. Since A1 and A are connected Lie subgroups of J with the same Lie subalgebras, we conclude that A1 — A. Then (E, u) is isomorphic to (£",i/) over Q as a consequence of Proposition 11.69. Let us prove (d). The natural pairing of £2hoi(</) with j = C9 makes /i(/) acts as a linear functional on j. We calculate the effect of /i(/) on A(X0(N)) = ker(j — J). The lattice A(X0(N)) has generators \LkfAOdCt where c\,..., c-i9 represent a Z basis of H\ (Xq(N), Z). Write / = 5Z rjfj • Then j = 53ri{dzj,at>=X;'-i / /*KK = / /«)#• j j Jck Jck By (11.37) and Proposition 11.22, we conclude that M/)(A(*o(AO)) = X;Z / /(C)rfC = A/. (11.104) Let a C j be the Lie algebra of A (the tangent space at the identity). We shall verify that ker/i(/)= a. (11.105) In fact, /i(/) is a nonzero multiple of v*(<jj), and thus ker^i(/) = {ti€j | (!/»,«> = 0} = {tt6j|((Jf(A/)(u)> = 0} = {w € j | di/(u) = 0} since a; spans £lho\(E) = ker(di/) = (Lie algebra of A) = a. This proves (11.105). The homomorphism j —► J with kernel A(Xq(N)) is really the exponential map of Lie theory and thus must implement a —► A. Hence the kernel of a —> A is ar\A(X0(N)). Since ,4 is compact, af\A(Xo(N)) is a
11. ELLIPTIC CURVES CONSTRUCTED FROM S2(r0(N)) 379 lattice in a, evidently of rank 2g — 2. Let x\,..., x2g-2 be a Z basis for it, and adjoin x2g-\ and x2g in A(Xo(N)) so that A' = Yl]3=i^xj nas rank 2g. Then A' has finite index in A(Xo(N))} say m, and it follows that A(X0(N)) is contained in ^-A'. Now c = /i(/)(i) = /i(/)(£R*i) = /i(/)(R*3f-i + Rx2s) shows that /i(/)(^2^-i) and fi(f)(x2g) are linearly independent over R. (11.106) On the other hand, M/)(Z*a,-i + Z*2„) = M/)(£Z*i) = rtWA') i C /i(/)(A(Xo(AT))) C Kf)(m-lK') 3 = fi(f)(m-llx2g-i + m~ 1Zx2*)- (11.107) Combining (11.104) with (11.106) and (11.107), we see that A/ = ii(f)(A(Xo(N))) is a free abelian subgroup of C of rank 2 that spans C over R. Consequently A/ is a lattice. By Theorem 6.14, E' = C/A/ is an elliptic curve over C. Let r\ : C —► C/A/ be the quotient homomorphism. The composition r\ o fi(f) : j —♦ E' is given by j —> j/a ~ C — C/A/ - £' and has kernel ^(/^(A/), which equals a + A(X0(jV)) by (11.104) and (11.105). Since j —* J is a covering homomorphism with kernel A(Xq(N)), rjo /i(/) factors through j —► J. Let us say ?0/|(/)=£0(j->/) with e : J —► i?' a holomorphic homomorphism with kernel the image of a + A(XQ(N)) under j -► J, namely A. By Theorem 11.72, e is a morphism defined over C. Since kere = A, the universal mapping property of (E,i/) in Proposition 11.69 implies that e = e' o v for a morphism e' : E —* J£' defined over C. The kernel of e' is trivial. Combining (11.85) with our results on dual isogenies, we see that e' is an isomorphism over C. This completes the proof of (d) and all of the characteristic 0 part of Theorem 11.74 except for Lemma 11.76.
380 XI: EICHLER-SHIMURA THEORY PROOF OF LEMMA 11.76. Because of the universal mapping property of J, it is enough to prove that T&(n) is defined over Q. Here T*(n) is given by T*(n)(r) = £ r^ f(C) rfC, r e XQ(N), (11.108) where f = I I. (To make sense of the o^r's in this formula, we must choose a representative of r in W*, compute each 0,7", and project back to Xq(N). We always assume this has been done.) Let G = AutQ(C). This group operates on various things in our setting. When a member <p of G acts on a polynomial P to give Pv, P^ is obtained by having y? act on just the coefficients. When the polynomial gets regarded as a member of a local coordinate ring of a variety, for example, <p acts on the points of the variety and the relationship is P*(x) = <p(P(<p-1x)). (11.109) We shall use this notion in order to localize the proof of rationality. If (F0, • • •, Fm) gives the imbedding of J as a nonsingular variety in a projective space, we may take Fo, • • •, Fm to be defined over Q.* Then T#(n) is certainly defined over Q if we show that r —ff(r)=(£ pTf(C)rfc) forP = P0,...,Pm (11.110) is in Q(X0(N))y i.e., that H* = H for all <p G G. In view of (11.109) it is enough to show for all r e X0(N), <p G G, and F € Q(«/) that ^{T.n rf(o<*<)W(£f'rf(<)<*;) (ii-iii) whenever F is defined at all points of the relevant finite set. *As in Theorem 6.14, one version of Fo,...,Pm may not handle all points of J simultaneously. We may have to normalize (Fq ,..., Pm) by one or more members of Q(J) to handle all points. This is not a serious matter for the present proof, and we shall ignore it.
11. ELLIPTIC CURVES CONSTRUCTED FROM S2(r0(N)) 381 Since F is defined over Q and addition within J is defined over Q, we can move (p past F and the summation sign on the left side of (11.111), and it is enough to prove that JX/"" rf(CK)=£r,Tf(OdC (11.H2) ,• •'To ' j J To for all r and <p. Since $ : Xo(N) —* J is defined over Q, , fr' Mr') <p(JT f(c)rfc) = y f(o«/c (n.113) for all r'. Thus (11.112) comes down to proving that ,- Ao t -/to or equivalently (under the change of variables r —► y>(r)) rip(atT) rat<p(r) J2 fKK = £/ f(0«- (11.H4) j J to t «/r0 Fix r 6 Xq(N) and y? £ G (together with representatives of r and <p(r) in W*, called by the same names). We shall prove for a suitable permutation i —► /(i) that ^(of,-r) = a/(i)y?(r) for all i, (11.115) and this will prove (11.114) and the lemma. Let h be an arbitrary member of Q(j,Jn) that is defined at every point of a finite set of points of interest in Xq(N) (including all points in (11.115)). We form the polynomial K Y[(X - h o ai) = XK + hK.1XK~l + ... + hQ. (11.116) •=i If 7 is in To(N), write otij = 7t<*j(») for a permutation i —► j(i). Then K K K H(X-hoaioy) = H(X-hoaJii)) = l[(X-hoai)i •=i
382 XI: EICHLER-SHIMURA THEORY from which it follows that hx-i,..., Ao in (11.116) are invariant under Tq(N). By the same argument as in Proposition 9.5, all of Aa-_i, ..., h0 are in Ao(To(N)). From §6 we know we can view these functions as in K(Xo(N)), which Theorem 11.33 identifies as C(j,jn). We shall prove they are in Q(j,Jn). Let h(r) = Y^=-m Cn<ln ^e ^e 9 expansion of h at oo, with q = e2*tT. Since h is in Q(i,in), the c„'s are in Q. Let a,- = f q^J, and put qd = e2*iTld and <d = e2*{ld. Then fco«<(r) = fc(^) = £) cn&l Expanding out the left side of (11.116), we see that the coefficients in the q expansions of fe/c-i, • • • >^o are in Q(Ov)- Now we can argue as in the last paragraph of the proof of Theorem 11.32 to see that the coefficients for fejc-i,..., ho are in fact in Q. Applying Corollary 11.50, we conclude that fcjr-i,..., h0 are in Q(j,J7v). Let us evaluate (11.116) at r: K ]J(X - h(<*iT)) = XK + hK-i(r)XK-1 + • • • 4- &o(r). •=i Since h is in Q(jJN), we have h*~l = h. Thus h(r') = ^(M^O"'))). and K l[(X - ^(A^r)))) = XK + Ajr-^r)**-1 + • • • + A0(r). •=i At this stage we are dealing with a polynomial over C, and it makes sense to apply <p to both sides (by applying it to the coefficients). We obtain K H(X - h(<p(aiT))) = XK + ip(hK^1(r))XK'1 + ■ • ■ + <p(hQ(r)) •=i = xK + fc^Wr)^-1 + • ■ • + M^)), since fc/c-i,..., h0 are in Q(j,jV) and hence are fixed by y?. The above expression is
12. MATCH OF L FUNCTIONS = fl(X-h{aMr))) 1=1 by (11.116). Consequently Mv>(a^)) = h(aKi)<p(r)) 383 (11.117) for a permutation i —► /(£) depending on h (as well as r and ip). We need to obtain (11.117) with a permutation that is independent of h. For the algebra of h's in Q(j,Jn) under study, let Sfc = {permutations i —► /(i) | (11.117) holds for h}. If {hi,... >hm] is a finite set of such fc's, then for each c £ Q there is a permutation that works for hi + ch2 + c2/i3 -f • • • 4- cm"lhm. Since Q is infinite, there are m values of c, say ci,...,cm, that work with some common permutation /(i). Then we have /l ci 1 c2 \l cm „m—1 Cl „m-l „m-l MvKtt*(r)) - /i2(a/(0^(r)) I [ 0 ) U(^W)-M«Ko^))i ^ Since the Vandermonde matrix on the left is nonsingular, we see that z(i) is in sfclrv..nSAm. Thus for every finite set {hu ..., ftm}, Sfcin---n 5/»m is nonempty. Choose a finite set {hi,..., hja } for which Shxf\- '^ShM has the smallest possible cardinality. Then we have p| sh = shlr\---r\shM /0. all h For any member of the left side, (11.115) is valid for all h. This completes the proof of the lemma. 12. Match of L Functions The proof of Theorem 11.74 is now complete except for part (e), which says that the L functions of E and / coincide as Euler products except
384 XI: EICHLER-SHIMURA THEORY possibly at finitely many primes. This is the characteristic p part of the proof. Carrying out the details would involve too much further preparation in algebraic geometry, and we shall settle for some discussion. If p is one of the primes for which (e) is asserted, the idea is to prove an identity of the type fp = <(> + $ (11.118) in Xq(N)} where Tp is a reduction modulo p of the operator T(p) of (11.95) and where <j> is the Frobenius map. Before trying to make sense of (11.118) in general, let us consider the special case where Xo(N) has genus 1. (The first such example is when N = 11.) Then we can identify XQ(N) with the elliptic curve E produced by Theorem 11.74. From §V.2 it is meaningful to speak of Epy the reduction of E modulo p. In EPi (11.118) becomes an identity within End(Ep)y with Tp acting as [cp] in accordance with (b) of the theorem. Also within End(2?p) we have [#£(Zp)] = [deg([l]-4)] = ([l]-£)° ([!]-<») = [l]-{<j> + $) + \p], the last equality holding by item (c) after (11.82) and by Proposition 11.65 again. Thus <l> + $=\p+l-#E(lp)) = [ap], in the notation of (10.7). In other words, (11.118) as an identity within End(Ep) says that [cp] = [flpj. It can be shown that End(Ep) has no zero divisors, and hence cp = ap. This equality says that the pth factor of L(s, /) coincides with thepth factor of L(s,E). It is not hard to imagine a similar interpretation of (11.118) within a reduction modulo p of J in the case of general Xq(N). The identity passes to Epy and we get the same conclusion. In fact, reduction modulo p of J was too technically difficult to be the method that was used. Instead (11.118) was proved in Xo(N)y and then some consequences were transferred to J and E\ the identity itself was not transferred. We shall not pursue this transfer of consequences. The meaning that is customarily attached to (11.118) is as an identity of "correspondences" on Xo(N)p, the reduction modulo p of X0(N) by (11.88) by Proposition 11.65
12. MATCH OF L FUNCTIONS 385 (which we have not defined in genus > 1). Intuitively a correspondence is like the graph of a function, except that the function need not be single-valued, and one must be careful in counting multiplicities. More precisely but still not exactly, the building blocks of correspondences in X0(N)p are irreducible subvarieties of dimension 1 in Xo(N)p x Xo(N)p (curves, in other words), and a correspondence is a divisor-like Z combination of such irreducible curves. Both Tp and <f> are to be regarded as correspondences, and <fi refers to the correspondence <j> with the two coordinates interchanged. The sum on the right side of (11.118) is the sum in the sense of divisors. We shall not go into the proof of (11.118) but shall end our discussion of the characteristic p part of the proof at this point.
CHAPTER XII TANIYAMA-WEIL CONJECTURE 1. Relationships among Conjectures Throughout this chapter, E will denote an ellliptic curve defined over Q. By Corollary 10.6 the L function L(s,E) converges for Re s > § and is given there by an absolutely convergent Dirichlet series. It is expected that deep arithmetic information is encoded in the behavior of L(s, E) beyond the region of convergence. For example, the Birch and Swinnerton-Dyer Conjecture (Conjecture 1.10) predicts that the rank of E(Q) is the order of vanishing of L(s, E) at s = 1. To make sense of such conjectures, we must expect that L(s, E) has an analytic continuation. In view of the behavior of Dirichlet series that are better understood, it is natural to expect also that L(s, E) will satisfy a functional equation. Here is a reasonably precise formulation. Conjecture 12.1. The function L(s,E) extends to be entire, and for a suitable sign e and a suitable positive integer AT, L(s, E) satisfies a functional equation given in terms of A(s,E) = Ns!2(2ir)-sr(s)L(s,E) (12.1) by A(syE) = -eA(2-s)E). (12.2) Hasse had the further idea that L(s,E) should have as good transformation properties as any other reasonable Dirichlet series. In the case of L(s, f) for a cusp form / 6 Sk(To(N)) in one of the eigenspaces SI(Tq(N)) of Wtf, there is more that can be said beyond Theorem 9.8. For one thing, by examining (9.27), (9.36), and (9.37), we see that the entire function \(s, /) = N'<2(2*)-T(s)L(s, f) (12.3) is actually bounded in vertical strips. For another thing, let us consider some modifications of L(s, /). Write ^./) = £^- (124a) n = l 386
1. RELATIONSHIPS AMONG CONJECTURES 387 let m be an integer relatively prime to Ny let \ be a primitive Dirichlet character modulo m, and let ^./.x)=f;^- (12.4b) Define A(s, /, X) = (m2Ny'2(2x)-°T(S)L(S, f, x). (12.4c) Recall from (7.46) the Gauss sum m-l cKx)=^2*x(|). (12.5) The following theorem is proved by combining the techniques for Theorem 7.19 and Theorem 9.8. Theorem 12.2. Let / be in S|(r0(A0), let GCD(m,JV) = 1, let X be a primitive Dirichlet character modulo m, and let L(s,/, x) and A(5>/>x) be defined by (12.4). Then L(syf,x) is initially defined for Re s > f -t-1 and extends to be entire in s. Moreover, A(s, /, x) is entire and bounded in vertical strips, and it satisfies the functional equation A(.,/,x) = £(-l)^^Xy)A(*-.,/,x), where c{m,x) is the Gauss sum (12.5). The Hasse-Weil Conjecture expects L(s} E) to share all these properties. It is given in terms of L(s,E) = Y^=i ~~7 an(J ifcs modifications ^^.x) = E~1£^- Conjecture 12.3 (Hasse-Weil). The function L(s,E) extends to be entire and, for a certain positive integer TV, so does L(s,£,x) for every Dirichlet character whose conductor m is prime to TV. Moreover, the modified functions A(«, E) = Ns'2(2ir)~sr(s)L(s} E) A(syEiX) = (m2TVr/2(27r)-T(5)I(5,E,X)
388 XII: TANIYAMA-WEIL CONJECTURE extend to be entire and, for a suitable sign £, satisfy the functional equations A(s, £) =-£A(2 -*,E) Weil addressed the question how close £(«,£") is to coming from a modular form if it satisfies functional equations as in Conjecture 12.1 or 12.3. Suppose that L(s,E), adjusted by factors as in (12.1), is really the Mellin transform of /(*<r), where / is an analytic function on the upper half plane Ji transforming under a group T of linear fractional transformations. The fact that we are obtaining L(s,E) from a Mellin transform incorporates a transformation law under the translation corresponding to T = ( o 1 ) • ^ne functional equation (12.2) incorporates a transformation law under some inversion element, such as the one from aN = (jv V J* Conjecture 12.1 does not give any reason for thinking that T is any larger than the group V generated by the transformations corresponding to T and a^. Unfortunately r'\W* is noncompact and not close to compact. So V does not give us much control over an analytic function on 7{. Weil found, however, that the additional functional equations (12.6) are enough to force matters to be controlled by T0(N). The statement of the Weil Converse Theorem is as follows. We shall not give the proof. Theorem 12.4 (Weil). Let L(s) = £~=1 ^ be a Dirichlet series with cn = 0(nc) for some c > 0. Fix a positive integer N, an even positive integer fc, and a sign e. Suppose that (a) the function A(«) = N*'2{2ir)-8r(s)L(s) is entire, is bounded in every vertical strip, and satisfies A(s) = e("l)k/2A(k-s)i (b) for every integer rn with GCD(m, N) = 1 and every primitive Dirichlet character \ modulo m, the modified function M.) = £^
1. RELATIONSHIPS AMONG CONJECTURES 389 is such that the function Ax(«) = (m>NY'l(2n)->r(S)Lx(s) is entire, is bounded in every vertical strip, and satisfies (c) the series defining L(s) converges absolutely at s = k — 6 for some *>0. Then f(r) = f2c^inT n=l is a cusp form in S*(r0(iV)). In fact, it is not necessary to assume (b) for quite so many m's, but this improvement will not concern us. Theorems 12.2 and 12.4 show that the Hasse-Weil Conjecture is completely equivalent with the following fundamental conjecture. Conjecture 12.5. If E is given, then there exist a positive integer N and a sign e such that the function L(s,E) equals L(s,f) for some eigenform / in 5|(r0(AT)). The notion of newform was not known at the time of Weil's work, but we can freely substitute the word "newform" for "eigenform" in Conjecture 12.5. In fact, if / is an eigenform for S|(r0(Af)), ^ comes from a newform for some S^Tq^/M)) with M dividing N. If M / 1, then Theorem 9.8 gives two contradictory functional equations that / must satisfy. Thus M = 1, and / is a newform. Conjecture 12.5 is consistent with an important theme that has been discussed on more than one occasion earlier in this book: the identification of geometric L functions as automorphic L functions. The conjecture raises two immediate questions: What are N and e? Is there a deeper relationship between E and /? Eichler-Shimura theory gives part of the answer to the second question. Starting from any normalized newform / in S2(r0(N)) whose Fourier coefficients are integers, the theory provides us with a mapping from Xq(N) to a canonical elliptic curve over Q. Even before this theory, Taniyama had suggested looking for such a mapping in connection with an arbitrary elliptic curve over Q, but Taniyama's suggestion was apparently initially overlooked. Weil made a similar suggestion as the Eichler-Shimura theory unfolded.
390 XII: TANIYAMA-WEIL CONJECTURE Conjecture 12.6 (Taniyama-Weil). If E is given, then there exists an integer N for which there is a nonconstant morphism F : Xq(N) —► E. We say that such an E has a modular parametrization of level N, and, following, custom, we call E a Weil curve. If E is given by the usual equation (3.23), then it is equivalent to say that there are functions f (r) and tj(t) in Ao(ro(N)) whose q expansions at oo have rational coefficients and are such that (x, y) = (£(r), t){t)) satisfies (3.23b) for all t where neither £ nor r\ has a singularity. The passage from F to (£,17) is just a matter of unwinding the definitions; the passage from (f, 77) to F requires Corollary 11.50. The modular parametrization in the Eichler-Shimura theory has additional properties. From a newform / of level N, Theorem 11.74 produces a morphism from Xo(N) to an elliptic curve E over Q such that the pull- back of the invariant differential is a multiple of /(r) rfr and the period lattice of J f{r)dr is equivalent with the period lattice of E. If E' is isogenous to E over Q, one can compose the Eichler-Shimura map with the isogeny to get a modular parametrization of E'', but the match of the period lattices may be lost. We saw in an example following Corollary 11.50 that there is a morphism from Xo(MN) to Xq(N). The Eichler-Shimura mapping to a curve E can be composed with such maps to produce modular parametrizations of E of nonminimal level. Theorem 12.7. If E has a modular parametrization of level N but of no level M for M < JV, then E comes via the Eichler-Shimura theory from the map defined by a normalized newform in S2(r0(N)), followed by an isogeny defined over Q. In constructing an elliptic curve E from a normalized newform / £ S2(ro(Ar)), the Eichler-Shimura theory shows that the pth L factor of E agrees with the pth L factor of / for all but finitely many p> and Igusa's work shows that any exceptional p must divide N. There are two loose ends—to identify N and to handle the exceptional p's. There was an early suspicion of what N was. The conductor N of E is an isogeny invariant of E with a cohomological definition, but it can be described almost completely in a simple way as follows. First let us adjust E so that it is given by a global minimal Weierstrass equation (§X.l). Then let A be the discriminant of E. The conductor N divides A and has the same prime factors as A. The power of a prime dividing TV" is 1 if and only if Ep has a node. If p > 3, then the power of p dividing N is 2 if and only if Ep has a cusp. For the case of a cusp with p = 2 or
1. RELATIONSHIPS AMONG CONJECTURES 391 3, the conductor can be computed by an algorithm due to Tate. Those curves in Tables 3.1 and 3.2 whose conductor is not obviously |A| are listed in Table 12.1. Elliptic Curve y2 + y = x3 y2 + xy- y = x3 y2 = x3 — x2 + x y2 = x3 + x2 + x \y2 + 3xy - y = x3 y2 + xy = x3 + x y2 = x3 + x y2 = x3 - x y2 = x3 - 2x + 1 y2 = x3 - 2x - 1 y2 = x3 + x2 - x y2 = x3 — x2 — x y2 + bxy + y = x3 A -27 -28 -48 -48 -54 -63 -64 64 80 80 80 80 98 N 27 14 24 48 54 21 64 32 40 80 20 80 14 TABLE 12.1. Conductors of some elliptic curves The following theorem ties up both loose ends in the Eichler-Shimura theory. Theorem 12.8 (Carayol). Let / 6 S2(ro(A0) be a normalized newform whose Fourier coefficients are integers, and let E be the elliptic curve over Q associated to / by Eichler-Shimura theory. Then L(s,E) = L(s,f), and N is the conductor of E. It follows that the Taniyama-Weil Conjecture implies the Hasse-Weil Conjecture. In fact, if E is given, we choose N to be the smallest positive integer such that E has a modular parametrization of level N. Theorem 12.7 says that E comes from the map defined by a normalized newform / in S2(r0(N)), possibly followed by an isogeny over Q. Since isogeny over Q does not affect the L function (Theorem 11.67), Theorem 12.8 says that L(s,E) = L(s,/). This is the conclusion of Corollary 12.5, which we have seen is equivalent with the Hasse-Weil Conjecture. We can come close to proving that the Hasse-Weil Conjecture implies the Taniyama-Weil Conjecture. Namely let E be given with conductor N. By the Hasse-Weil Conjecture in the form of Conjecture 12.5, there
392 XII: TANIYAMA-WEIL CONJECTURE is a normalized newform / in S2(To(N')) such that L(syE) = L{s,f). By the Eichler-Shimura construction and Theorem 12.8, / leads to an elliptic curve E1 of conductor N' with L(s,E') = L(syf). Since L(s, E') = L(sy E), a theorem of Serre allows us to conclude that E and Ef are isogenous over Q provided j(E) is not an integer. In this case the modular parametrization for E yields one for E1 by composition. 2. Strong Weil Curves and Twists Let E and E' be Weil curves, and let F : X0(N) -> E and F' : Xq(N) —♦» Ef be modular parametrizations, with N minimal in both cases. Without loss of generality, we suppose that oo of Xq(N) maps to the group identity of E and E1. We say that (EyF) dominates (E',Ff) if F' factors through F, i.e., if F' = <p o F for an isogeny ^>. A Weil curve E is a strong Weil curve if (E,F) is maximal in this ordering for some F. (The F yielding maximality will not be unique, as it can always be composed with —1 in E.) Proposition 12.9. In every isogeny class of Weil curves, there is a strong Weil curve and it is unique up to isomorphism. The strong Weil curve and map {E,F) are characterized within the isogeny class by either of the following conditions: (a) the induced map F* on homology Hi(Xo(N),l) —> H\(E,Z) is onto. (b) when F is factored through the Jacobian variety as F\ o $, the kernel of F\ is a (connected) variety. Example. In §XI.l, X0(ll) led to three Weil curves, of which the first two were E : y2 + y = x3 - x2 - 10* - 20 (12.7) and E': y2 + y = x3-xy (12.8) defined by respective lattices A and A' in C. The lattice A was defined as the image of the cycles of Xq(N). Hence H\(Xo(N),Z) maps onto H\(EyT), and (12.7) is a strong Weil curve. We identified E1 by using the sublattice A', which does not include the image of the cycles of JYo(N), and then used a dual isogeny to map E to E'. It would have amounted to the same thing if we had used ^A', which properly contains A, to define E1. In this case the cycles of Xq(N) do have image in ^A', but the homology map is not onto. Thus E' in (12.8) is not a strong Weil curve.
2. STRONG WEIL CURVES AND TWISTS 393 The Eichler-Shimura construction uses the full image of cycles of Xq(N) as the set of periods to define an elliptic curve, by Theorem 11.74d. Thus it always leads directly to a strong Weil curve. What we see computationally from the Eichler-Shimura construction is a lattice in C, from which we can compute the j-invariant approximately, as in §XI.l. Even if we assume that our approximation to the j-invariant is an approximation to a particular rational number, we have not completely solved the problem of identifying the strong Weil curve as a particular curve over Q. There remains the problem of distinguishing a curve over Q from its twists, as we saw by example in §XI.l. We now take up the question of twists more generally. Two elliptic curves E and E1 over Q are called twists of each other if j(E) = j(E'). We shall skip over the cases that the common value of j is 0 or 1728, which require special treatment. We may assume that E and E1 are in global minimal form (3.23b) with respective coefficients ai,...,ae and ai,...,^ as usual. Since j is not 0 or 1728, none of C4,C4,C6,c'6 is equal to 0. Put v - A/A' 6 Q. Since j(E) = j(E') and j(£) = ! = 1728+|, we see that = — = ii = ii V A' c43 eg' It follows that v = u6 for some u = - in Q with GCD(r, s) = 1. Choosing 5 as > 0 and the sign of r to match the sign of ce/c^y we have (S2C4,*3c6>56A) = (r24rV6)r6A'). (12.9) By Lemma 10.1 and Theorem 10.3, neither r nor s has any prime square factor p2 with p > 3. Conversely if E and E1 are related as in (12.9), then j(E) = j(E'). Proposition 12.10. Let E and E1 be twists of each other, related as in (12.9), and let their L functions have respective coefficients ap and a^ If p is a prime not dividing 6AA', then where ( J is the Legendre symbol.
394 XII: TANIYAMA-WEIL CONJECTURE Proof. Let Np and Np be the respective numbers of affine solutions modulo p. In view of (10.7), we are to show that Np = N'p if ( J = 1 and that Np + Np = 2p if ( rs J = — 1. Since p is prime to 6, we may take the two equations to be y2 = x3 - 27c4x - 54c6 (12.10a) y2 = x3 - 27c'4x - 54c'6. (12.10b) Suppose ( ) = 1. Choose a £ Z with ra2 = 5 mod p. If (xo,yo) solves (12.10a) modulo p, then .2 /r\3 J c^o-54(- shows that !/g=*2-27(J) ^o-54 0 4 (^ar-^o)2 = (sr-^o)3 - 27c'4(sr-1x0) - 54^, i.e., that (sr~1x0,sar~ly0) solves (12.10b) modulo p. Thus JVp < Npi and similarly Np < Np. Now suppose that ( I = — 1. In the same way, we check, for each fixed xo, that the sum of the number of solutions (xo, y) of (12.10a) and the number of solutions (sr~lxo)y) of (12.10b) is 2. Summing on xo, we obtain Np + N'p - 2p. By quadratic reciprocity, Proposition 12.10 says that L(syEf) in almost all factors equals L{s,E,x) f°r a quadratic Dirichlet character modulo Ars. Taking Theorems 12.2 and 12.4 into account, we see that Proposition 12.10 almost proves that the question whether E is a Weil curve depends only on j{E). 3. Computation of Equations of Weil Curves How can we check definitively whether a particular elliptic curve is a Weil curve? And how can we write down equations for the Weil curves that arise from Xq(N)? Let us first address these questions for strong Weil curves.
3. COMPUTATION OF EQUATIONS OF WEIL CURVES 395 The classical method of proof is to exhibit £(r) and t)(r) in Aq(To(N)) such that (£(r),77(r)) parametrizes the curve. This method requires constructing some cusp forms, especially of weight 2, and manipulating identities to get the right results. Constructing cusp forms of weight 2 is a problem in its own right that we address below. In any event, the classical method has been carried out in genus 1 and for some higher cases, particularly in genus 2. It becomes really tedious as the genus increases. \9 0 1 2 N 1< N < 10; 12, 13, 16, 18, 25 11, 22, 14, 23, 15, 26, 17, 28, 19, 29, 20, 31, 21 37, 24, 27, 50 32, 36, 49 Table 12.2. Curves Xo(N) of low genus In the case that the conductor is of the form N = 2a3\ one knows all possible elliptic curves by classifying all relevant integer solutions of 1728A = C4 — c%. All such elliptic curves are Weil curves. To prove this, we start with an elliptic curve of conductor N} we compute many coefficients ap from (10.7), and we check that they correspond to the first part of the q expansion of a newform in S2(r0(A0) (see below). Then using the method of §XI.l, we can compute to several decimal places the lattice in C generated by images of cycles of Xo(N). Next we can compute the period lattice of the given elliptic curve as in Chapter VI, using the Gauss arithmetic-geometric mean, and we can use some Cx multiple of it to organize the information about the lattice in C. Once we know a basis for the images of cycles of Xo(N)) we can compute 02, 03, and ,; as in §XI.l. If the computed j does not match j for the elliptic curve we have in mind, then the elliptic curve that we have in mind is not a strong Weil curve. But it may still be an ordinary Weil curve, and a method for approaching that question will be discussed later. The difficulty now from the point of view of proof is in identifying the computed j as an exact rational number. We have it only to several decimal places. But when N = 2a36, the classification of curves of conductor N has bounded the denominator of j, which is A. Thus it tells us the value of j as a rational number, provided we have computed enough decimal places.
396 XII: TANIYAMA-WEIL CONJECTURE In the case of conductor N = 2a36 we can therefore tell exactly whether our computed j matches j for the given elliptic curve. If so, the curves are twists of one another. Since the discriminant can have only 2 and 3 as prime factors, the set of possible twists is quite limited. Using Proposition 12.10, we can check for the correct Fourier coefficients. Recent progress allows use of a similar technique in the case of prime conductor, and it should be expected that further progress will expand the method. The following theorem bounds the denominator of j in the case of prime conductor and allows us to proceed as above. Theorem 12.11 (Mestre-Oesterle). If a Weil curve E is in global minimal form and its conductor N is a prime, then the discriminant A of E divides N5. The problem of writing down all ordinary Weil curves coming from Xo(N)) with no side information, is a hard one. The problem is that it is easy to miss a member of an isogeny class. Instead let us address the question of proving that a particular curve E of conductor N is a Weil curve. We start as above by computing many coefficients ap from (10.7) and checking that they come from an initial segment of a newform in S2(Fo(N)). We form the lattice of E and the lattice from X0(N). Even if they do not match (after some multiplication by Cx), it may happen that the lattice from E corresponds to a sublattice of the images of cycles of Xq(N). Let n be the index of the lattice from E in the lattice from X0(N)y and let E1 be the strong Weil curve. The goal is to find an isogeny of degree n from E to E1 or from E1 to E. We shall not discuss the details of how to proceed, but there are formulas that help in the effort. Success in proving that particular elliptic curves are Weil curves depends on being able to produce initial segments of q expansions of new- forms. Although the methods of §IX.3 can be a little helpful, ultimately one has to come to grips with an actual construction of newforms. For low conductor the standard technique is to compute the action of the Hecke operators on Hx(Xq(N),Z) and then on Hf(XQ{N),C). (See §XI.5.) Proposition 11.24b yields matrices representing T2(n) in some basis. Computation of a basis of Hi(X0(N),l) takes some time but is manageable. A second technique is to produce initial segments of q expansions of eigenforms by using trace formulas for the Hecke operators. Once the trace is known for T2(n), one can compute the characteristic polynomial and then the eigenvalues; from the eigenvalues, one can piece together the q expansions of the eigenforms.
4. CONNECTION WITH FERMAT'S LAST THEOREM 397 To obtain the characteristic polynomial of T2(n), we use an observation from linear algebra. If L is a linear transformation on an r dimensional complex vector space with characteristic polynomial r det(XI -L) = Y[(X -\j) = Xr- Cr-iXr-1 + • • • + (-1)%, (12.11) i=i then Ci is the ith elementary symmetric polynomial in Ai,..., Ar and is a linear combination of the symmetric polynomials A{ + X{ + • • • + \{ = TV(L'), 1 < j < r. For L = T2(n), Theorem 9.17 allows us to express T^riy as a linear combination of the operators T2(m) for which m | n. Hence (12.11) can be computed from the traces of the Hecke operators. Once we have the eigenvalues, we have to correlate the effects of the operators T2(n) as n varies. For example, if the space is 2-dimensional and T2(3) has a and b as eigenvalues and T2(5) has c and d as eigenvalues, then we can look at T2(10) to see what the simultaneous eigenvalues are. If T2(10) has eigenvalues ac and 6rf, then the simultaneous eigenvalues are {a,c} and {6, d}\ otherwise they are {a,d} and {6,c}. Everything is easier if we have a candidate for an initial segment of an eigenform, such as the corresponding coefficients of an elliptic curve. Then we just have to check that the coefficients solve (12.11) and satisfy recursions consistent with Theorem 9.17. 4. Connection with Fermat's Last Theorem Suppose a' +/?' = yl is a relatively prime counterexample to Fermat's Last Theorem, with / prime and > 5. The elliptic curve E over Q given by E: y2=x(x-a')(x-y') (12.12) has A = 16a2'/?V (12.13) c4 = 16(a2'-aV + 72'). Let us put E in global minimal from and compute its conductor N. Any odd prime p dividing A divides a(3y. Since a,/?,7 are relatively prime in pairs, we see that p does not divide C4. By Lemma 10.1, E is minimal
398 XII: TANIYAMA-WEIL CONJECTURE at p. If p | ay} we readily check that Ep has a node at (0,0), and if p | /?, we readily check that Ep has a node at (a/,0). Thus ^ = 2P°wer Y[ P- (1214) p odd We still have to check p = 2. Since 24 || C4, at most one reduction is possible. We may assume that 7 is even, so that y1 = 0 mod 32. (12.15a) Since /?' = —a' mod 4 and / is odd, we may assume, possibly by interchanging a and /?, that a1 = 1 mod 4. (12.15b) Putting x — 4X and y = 8Y + 4X, we are led to Y2 + XY = X3 + J(l - otl - V)*2 + jfeaV*. (12.16) The congruences (12.15) show that the coefficients here are integers (so that (12.16) is a global minimal form) and that the equation modulo 2 y2 + xy-Ix* The singularity is at (X,Y) = (0,0); since neither Y2 + XY nor y2 + XY ■+• X2 is a square, the singularity is a node. Taking (12.14) into account, we see that the conductor is given by N = JJ p. (12.17) p\apy We shall use the following theorem. Theorem 12.12 (Ribet). Suppose E is an elliptic curve over Q given in global minimal form and having discriminant A = Ylp\& P6p anc* con~ ductor TV = Ylp\&P*p- Suppose further that E is a Weil curve with a modular parametrization of level N given via a normalized newform f(T) = <I + £^=2 c"?n in S2(r0(N)). Fix a prime /, and let Ni = -^—. (12.18) n p p with and l\6p Then there exists fx in S2{T0(Nl)) such that /2(r) = £~=i d^n with integral coefficients and with cn = dn mod / for 1 < n < 00.
4. CONNECTION WITH FERMAT'S LAST THEOREM 399 Corollary 12.13 (Frey-Serre-Ribet). The Taniyama-Weil Conjecture implies Fermat's Last Theorem. Proof. Assuming that Fermat's Last Theorem is false, we apply Theorem 12.12 to the curve E in (12.12). Comparison of (12.17) and (12.18) shows that Ny - 2. The theorem produces /i in 52(r0(2)), which is the 0 space. Thus dn = 0 for all n. But c\ = 1. So c„ = dn mod / fails for n = 1, and we have a contradiction.
NOTES Chapter I The organization of Chapter I is based on Zagier [1988] and is influenced also by the Introduction of Husemoller [1987]. References to papers before 1940, among others, may be found in Shimura [1971a] and Serre [1973a]. For Theorem 1.1, see Borevich and Shafarevich [1966]. Cubic counterexamples to the Hasse Principle were studied by Selmer [1951]. Theorem 1.4 is discussed in Chapter III. What we have called the "Weierstrass form" of a cubic is really Tate's modification of the Weierstrass form. The first ten chapters of the book work in the context of plane projective geometry, and our definition of "elliptic curve" (as a nonsingular plane cubic curve in Weierstrass form) is appropriate for that context. In Chapters XI and XII, and usually in algebraic geometry, an elliptic curve is a nonsingular projective curve of genus one, and one proves that all such curves can be realized in Weierstrass form. See §XI.9. Theorem 1.5 is the subject of Chapter IV, and Theorem 1.6 is the main result of Chapter V. Theorem 1.7 appears in Mazur [1977] and [1978]. For Theorem 1.8, see Chapter X. Conjectures 1.11 and 1.12, as well as Theorem 1.13, are discussed in Chapter XII. Conjecture 1.10 appeared originally in Birch and Swinnerton-Dyer [1963-65]. For more recent numerical evidence, see Brumer and McGuin- ness [1990]. There are several expositions of Conjecture 1.10 in the literature. See Swinnerton-Dyer and Birch [1975], Zagier [1984], Appendix C of Silverman [1986], Chapter 17 of Husemoller [1987], and Chapter 20 of Ireland and Rosen [1990]. Chapters II-V The standard book on elliptic curves from an arithmetic standpoint is Silverman [1986]. Three other books on this subject are Koblitz [1984], Husemoller [1987], and Charlap and Robbins [1988]. Koblitz concentrates on elliptic curves with complex multiplication, especially the curves y2 = x3 — n2x that arise in connection with congruent numbers. Some books on curves are Brieskorn and Knorrer [1986], Fulton [1989], and Walker [1950]. Chapter II is based chiefly on Husem6Iler [1987] and Walker [1950]. The examples at the beginning of Chapter III are from Zagier [1988]. The standard notation is as in Tate [1974] and Silverman [1986]. Tables 401
402 NOTES 3.1 and 3.2 are the result of a simple home-computer calculation. The idea of the proof of Theorem 3.8, as we give it, is a standard one and may be found, for example, in Husemoller [1987]. Usually the proof in this form is not carried through to completion. The discussion of singular points is based on Silverman [1986]. Most of Chapter IV is from Zagier [1988]. The examples were in Zagier's lectures. Proofs of most of the results in §9 may be found in Ireland and Rosen [1990]. Theorem 4.11 goes also under the name Mordell-Weil Theorem. Mordell [1922] proved the theorem as given here, and Weil [1930] extended the result to elliptic curves defined over number fields. In Chapter V, Theorem 5.1 is due in a different form to Nagell [1935] and in essentially the form here to Elisabeth Lutz [1937]. Our treatment is similar to that of Lutz, as modified in Chapter 5 of Husemoller [1987]. The examples are based on Silverman [1986] and Husemoller [1987]. Theorems 5.2 and 5.3 are due to Nagell [1935] and Fueter [1930], respectively. Our attention was brought to these results by Lichtenbaum [1960], and we give Lichtenbaum^ proofs. The material of §5 is based on Husemoller [1987]; Husemoller credits H. P. Kraft, F. Knopp, G. Menzel, and E. Senn. See also Exercise 8.12 of Silverman [1986]. Chapter VI The material in §§1-3 is fairly standard and can be found in many places. For example, there are individual chapters in Lang [1987], Silverman [1986], and Husemoller [1987] on this topic. Sections 5-8 are based on Siegel [1969]. The material in §9 is from Zagier [1988]. Gauss's use of the arithmetic-geometric mean extended to complex numbers and is applicable in evaluating period integrals when the roots of the cubic are not all real. See Cox [1984] for an exposition. Chapter VII The definitive book about the Riemann zeta function is Titchmarsh [1951]. Dirichlet series are discussed in many analysis books. For Euler products, see pp. 217-220 of Husemoller [1987]. The material on Dirichlet's Theorem is taken from Serre [1973a]. Further motivation in terms of the zeta function of a number field may be found in Hecke [1981]. For the definition and properties of the gamma function T(s), see Ahlfors [1966]. For the elementary properties of the Fourier transform on the line, see Stein and Weiss [1971]. The results of §5 have several points of contact with §3.6 of Shimura [1971a]. The passage from the transformation laws of a function like 6{t) to the functional equation of an associated Dirichlet series is one of the main themes of Ogg [1969a].
NOTES 403 Chapters VIIMX Modular forms appear in many books. The definitive book is Shimura [1971a]. Other full-length books on the subject in the spirit presented here are Gunning [1962], Ogg [1969a], Lang [1976], and Miyake [1989]. In addition, there are chapters on the subject in Apostol [1976], Koblitz [1984], and Serre [1973a], and there are summaries in Gelbart [1975], Silverman [1986], and Husemoller [1987]. Our presentation of much of the material in Chapter VIII is based on Serre [1973a]. For §5 and §8, see also Ogg [1969a]. Our proof of Corollary 8.9 follows Siegel [1954]; for a different proof that gives different extra information, see Serre [1973a]. Some authors define weight differently. Also Hecke operators have more than one standard definition. Hecke operators, when defined as operators given by one double coset, as in Shimura [1971a], are slightly different from Hecke operators as we have defined them. The subgroup To(N) is treated in only a few places. The term "Hecke subgroup" follows Weil [1971b] and Gelbart [1975]. Our presentation is a reinterpretation of Ogg [1969a], Lang [1976], and Atkin and Lehner [1970]. The expositions by Ogg [1973] and Zagier [1990] are helpful also. For more detail about Example 4 in §IX.3, see Serre [1973] and Ogg [1973]. Example 5 is discussed in Ogg [1973], and a generalization is in Newman [1959]. In connection with §VIII.5 and §IX.4, see Ogg [1969a]. The details of the proof of Theorem 9.9 were omitted. For the Rie- mann surface structure, see Chapter 1 of Shimura [1971a] or Chapter 1 of Miyake [1989]. For the comparison of zeros and poles of meromorphic functions, see Proposition 11.5.4 of Farkas and Kra [1980]. The discussion of Theorem 9.10 mentions the Riemann-Hurwitz formula, which is given as Theorem 1.2.7 of Farkas and Kra [1980], and the fact that the space of holomorphic differentials has dimension <j, which is given as Proposition III.2.7. For Theorem 9.10, see §2.6 of Shimura [1971a]. The material of §IX.7 is taken from Atkin and Lehner [1970]. The proof of Theorem 9.22 may be found in that paper. Chapter X The material in §1 is due to Neron [1964]; see §VII.l and §VIII8 of Silverman [1986]. Our discussion in §2 of zeta functions and L functions is abbreviated. Silverman [1986] includes more detail in his Chapter V. The L function of an elliptic curve often has the name Hasse-Weil attached to it, and its shape is a motivating example for the Weil conjectures about varieties over finite fields. The final Weil conjecture generalizes Theorem 10.5, which is due to Hasse [1936]. We have given
404 NOTES a nonstandard elementary proof of this theorem, following Manin [1956]; Birch discovered an error in Manin's proof, and the error is corrected in Cassels [1957] and here. The standard proof, given in §§V.l and V.2 of Silverman [1986], is longer but has the advantage that its methods are applicable to other situations. For an exposition and history of Hasse's Theorem and the Weil conjectures and their proofs, see Katz [1976]. Chapter XI Eichler* Shimura theory refers both to the kinds of results in §§11- 12 and to a theory that seeks to generalize the material in §1 from S2(To(N)) to Sk(To(N)). The main references for the theme that leads to §§11-12 are Eichler [1954] and Shimura [1958], [1971a], and [1973b]. The article Swinnerton-Dyer and Birch [1975] includes a very helpful overview. For the generalization of §1, the references are Eichler [1957] and Shimura [1959]; Chapters V and VI of Lang [1976] give an exposition forSL(2,Z). In connection with §1, economical generators of Xo(N) are known if TV* is prime. This result is due to Rademacher [1930] and is quoted by Apostol [1990], p. 78. For the equations of E, E', and E" in §1, as well as the isogeny, see Velu [1971a]. See also Langlands [1990]. For proofs of the results in §2, see §1.5 of Shimura [1971a]. The material in §§3-4 about general Riemann surfaces is taken from Farkas and Kra [1980], and detailed proofs may be found there: For our Proposition 11.4, see Theorem II.5.1 of that book. For our Proposition 11.5, see Theorem III.2.7. Proposition 11.7 through Corollary 11.12 are in §111.4 of that book, Proposition 11.13 through Theorem 11.19 are in §111.6, and Theorems 11.20 and 11.21 are in §IV.ll. The invariant differential of an elliptic curve is treated in detail in Silverman [1986], pp. 52-53 and §111.5. The idea of the construction in the first part of §5 is in Swinnerton- Dyer and Birch [1975], and the proof has been assembled from arguments scattered throughout Shimura [1971a]. For example, our Proposition 11.22 is implicit in the proof of Shimura's Theorem 2.20, and many of the steps in the proof of Proposition 11.23 appear in §8.3 of Shimura's book. Shimura gives two proofs that the Hecke operators are given by integer matrices and thus have algebraic integers as eigenvalues. One proof uses the generalization of our §1 to weight k and appears on p. 84 and in Chapter 8 of Shimura [1971a]; the other uses what we have called Proposition 11.73 and is on pp. 171-172 of that book. Theorem 11.27 is a modification of Shimura's Theorem 3.51. The material in §6 through Theorem 11.33 and its lemmas is classical. Our treatment is based on §§3.3 and 5.2 of Lang [1987] and lectures of
NOTES 405 H. Matumoto. Theorem 11.36 is implicit in §§6.7 and 6.8 of Shimura [1971a]. Our treatment of general varieties and curves in §§7-8 is a merger of Chapter I of Hartshorne [1977] and Chapters I and II of Silverman [1986], and the reader is referred to those sources for proofs or references to proofs of results in algebraic geometry. In places where neither a proof nor a reference appears for such a result in either book, we have tried to give some indication of proof. For quoted results in commutative algebra, the reference is Zariski and Samuel [1958] and [I960]. The Hilbert Basis Theorem is on p. 201 of volume 1, and the Hilbert Nullstellensatz is on p. 164 of volume 2. Integral closure is discussed in pp. 254-265 of volume 1, and the result quoted to obtain (11.74) is on p. 265. Corollary 11.50 is implicit in Chapters 6 and 7 of Shimura [1971a]. The material in §9 is abstracted from Silverman [1986], largely Chapters II and III. For more information about isogenics, see Velu [1971b]. Lang [1983] treats abelian varieties in his §§1.1 and II. 1, and he introduces the Jacobian variety in §11.2. Proofs of Propositions 11.68 to 11.70 and of some of the assertions about Jacobian varieties may be found there. Historically Weil [1946] and [1948, reprinted in 1971a] reworked the foundations of algebraic geometry in part to deal rigorously with the Jacobian variety. Page 88 of Weil [1971a] remarks on the understanding of the Italian school of the relationship between correspondences and the Jacobian variety; using correspondences involves counting intersections properly, and this subject had never been pinned down. Lefschetz [1921] had constructed the Jacobian variety as a projective variety over C, and Weil's work constructed Jacobian varieties of curves defined over Q as "abstract varieties." Chow [1954] realized the Jacobian variety of a curve defined over Q as a projective variety over Q, and his paper contains the results we quote as Theorem 11.71. For Theorem 11.72, see Theorem VII of Chow [1949]. For Proposition 11.73, see §2.9, Proposition 9, of Shimura and Taniyama [1961]. For the theorem of Wedderburn given as Lemma 11.75, see pp. 63, 66, and 116 of Jacobson [1943]. Concerning Theorem 11.74, Eichler [1954] discovered (11.118) and used it to relate the L function of Xq(N) to the action of the Hecke operators. Shimura [1971a] adapted this algebraic argument and added the idea of looking at asubvariety of J to get a version of Theorem 11.74; see his Theorems 7.14 and 7.15. The theorem as we have given it is a sharpened version that appeared in Shimura [1973b]. Much of Shimura's proof, including the rationality of the Hecke operators, is couched in the language of adeles in order to gain generality; insight into what Shimura
406 NOTES is doing may be obtained from Chapter 7 of Lang [1987]. Shimura's proof of the equality of L functions uses his own general theory (Shimura [1955]) of reduction of varieties modulo p and is valid for all but finitely many primes. Igusa [1959] proves results about the reduction modulo p of Xq(N) implying that Shimura's argument is valid for all primes not dividing N. Chapter XII The presentation in this chapter, especially in §1, owes a great deal to Zagier [1988] and Swinnerton-Dyer and Birch [1975]. For Theorem 12.2, see §3.6 of Shimura [1971a]. Theorem 12.4 originally appeared in Weil [1967]; Chapter V of Ogg [1969a] gives an exposition. For the Taniyama- Weil Conjecture, see Taniyama [1957] and Weil [1967]. Theorem 12.7 is Theorem 4 of Swinnerton-Dyer and Birch [1975]. The notion of conductor dates to Artin and became manageable as a result of Ogg [1967]. An actual algorithm for computing it is the subject of Tate [1975]. Swinnerton-Dyer, Stephens, et al. [1975] assembled extensive tables of information about elliptic curves over Q with low conductor. Velu [1976] reported a small number of errors that had been discovered in the tables, including the omission of a page listing curves of conductors 121 to 124. Our Table 12.1 relies partly on those tables and partly on computation using Tate's algorithm. Miyake [1989] includes long tables of information about a few particular curves. Theorem 12.8 is due to Carayol [1986], p. 411. The arguments relating the Hasse-Weil Conjecture and the Taniyama-Weil Conjecture are in Swinnerton-Dyer and Birch [1975]. The end of our §1 alludes to Serre's Isogeny Theorem, which is on p. IV-14 of Serre [1968]. In §2, Proposition 12.9 is noted by Mazur [1973]. Twists appear in Atkin and Lehner [1970], Shimura [1973b], and Atkin and Li [1978]. See also §X.5 of Silverman [1986]. Ligozat [1975] works extensively with Xq(N) of genus 1, and the classical method of proving that curves are Weil curves can be seen there. Mazur and Swinnerton-Dyer [1974] show how TV* = 37 (of genus 2) can be handled, and Birch [1973] shows how N = 50 (of genus 2) can be handled. We have taken Table 12.2 from Mazur [1973], but it can readily be computed directly. Conductor 2°36 was treated in the unpublished Manchester thesis of F. B. Coghlan, and the numerical results are in Table 4 of Swinnerton-Dyer, Stephens, et al. [1975]. Theorem 12.11 is due to Mestre and Oesterle [1989]. In this connection, see also Theorem 1 of Coates [1970] and its interpretation in Theorem 8 of Tate [1974].
NOTES 407 A subject that this book has not treated is elliptic curves over Q with complex multiplication. All such curves are Weil curves, by a theorem of Deuring [1953-57]. The j-invariants of curves over Q with complex multiplication lie in a finite set. T>ace formulas for Hecke operators are the subject of Chapter 6 of Miyake [1989]. For original papers, see Eichler [1957], Selberg [1957], Eichler [1973], and Hijikata [1974]. The argument centered about (12.11) is taken from Miyake's book. Frey [1986] studied an equation more general than (12.12) and noted its relevance to Fermat's Last Theorem. Serre [1987b] studied this question at length and gave a conjecture whose truth would yield Corollary 12.13. Ribet [1990] proved enough of Serre's conjecture to obtain Corollary 12.13. We have followed Serre's paper in computing the conductor of (12.12). Connection with Representation Theory Langlands [1990] pulls together L functions, class field theory, elliptic curves, modular forms, infinite-dimensional representations of the general linear group, adeles, and automorphic representations, all in the space of 30 pages. His paper is a good one to start with, and a good one to return to. The idea that modular forms are connected with the representations of GL{2) seems to have originated with Gelfand and Fomin in the 1950's. Some expositions of the connection are in parts of Gelfand, Graev, and Pyatetskii-Shapiro [1969], Weil [1971b], Deligne [1973], Gelbart [1975], and Piatetski-Shapiro [1979]. In part, the connection is that one can identify cusp forms for Yq(N) in two stages with functions on groups. In the first stage the identification is with certain functions on SL(2,R) transforming on one side by Tq(N) and on the other side by the rotation subgroup. In the second stage it is with certain functions on G£(2, A), where A is the group of adeles of Q. The cusp forms are then intimately connected with the decomposition of the representation of GX(2,A) on L2 of the quotient space GL(2,Q)Za\GL(2, A), where Zf\ is the center of GL(2,A). Under this correspondence the Hecke operators end up with a surprisingly simple interpretation. Once this construction is complete, the objects of interest are pairs consisting of a certain kind of infinite-dimensional irreducible unitary representation of GL(2, A) and a finite-dimensional holomorphic representation of GL(2, C) (initially taken to be the standard representation). To such a pair the theory attaches an L function. To construct the L function, one recognizes the infinite-dimensional representation as a kind of infinite tensor product of representations of G£(2,R) and
408 NOTES GL(2,p-adics) for all p (see Flath [1979]), and one constructs a factor of the L function for each factor of the representation. The intention is to be guided by Hecke's classical theory, to obtain an analytic continuation and functional equation for the resulting L function, and to see connections with L functions in number theory. This program for GL(2) is carried out in detail in Jacquet and Langlands [1970]. For an exposition, see Robert [1973]. In retrospect, Tate's 1950 thesis (Tate [1967]) did the same thing for GL{\). This construction has become in part a way of generating automorphic L functions, and it lends itself to generalization. For generalization to GL(n), see Godement and Jacquet [1972]. For generalization to other kinds of groups, several things have to be brought to bear simultaneously. One can see them all at once in a preliminary form in Langlands [1970], and all at once in a later form in Langlands [1979]. For more detail one can consult Borel [1979], Borel and Jacquet [1979], and Tate [1979]. General sources for representation theory of reductive groups are Knapp [1986] for real groups and Silberger [1979] for p-adic groups. For a status report on construction of L functions, see Shahidi [1990]. The theory does more, however, than just produce L functions. It provides tools for working with them and bringing together the fields of mathematics in which L functions play a role. Three papers that explain such relationships are Gelbart [1984], Gelbart and Shahidi [1988], and Clozel [1990]. And, once again, one should look at Langlands [1990].
REFERENCES Conferences "Antwerp-Bonn": Modular Forms in One Variable, I-VI, Lecture Notes in Math. 320, 349, 350, 476, 601, 627 (1973-77), Springer-Verlag, Heidelberg Berlin New York. "Corvallis": Auiomorphic Forms, Representations, and L-functions, Proc. Symposia in Pure Math. XXXIII, parts 1-2 (1979), American Mathematical Society. "Ann Arbor": Clozel, L., and J. S. Milne, Auiomorphic Forms, Shimura Varieties, and L-functions /-//: Proceedings of a Conference Held at the Univesiiy of Michigan, Ann Arbor, July 6-16, 1988, Perspectives in Mathematics 10-11 (1990), Academic Press Inc., San Diego, Academic Press, Boston. "Motives": Proc. Symposia in Pure Math., American Mathematical Society, to appear. Books and Papers Ahlfors, L. V., Complex Analysis, McGraw Hill Inc., New York, 2nd edition, 1966. Apostol, T. M., Modular Functions and Dirichlet Series in Number Theory, Springer-Verlag, New York Berlin Heidelberg, 2nd edition, 1990 (1st edition, 1976). Atkin, A. O. L., and J. Lehner, Hecke operators on ro(m), Math. Annalen 185 (1970), 134-160. Atkin, A. O. L., and W. W. Li, Twists of newforms and pseudo- eigenvalues of W-operators, Invent Math. 48 (1978), 221-243. Birch, B., Some calculations of modular relations, Modular Functions of One Variable I, Lecture Notes in Math. 320 (1973), Springer- Verlag, Berlin Heidelberg New York, 175-186. Birch, B. J., and H. P. F. Swinnerton-Dyer, Notes on elliptic curves I and II, J. Reine Angew. Math. 212 (1963), 7-25; 218 (1965), 79-108. Borel, A., Automorphic L-functions, Auiomorphic Forms, Representations, and L-functions, Proc. Symposia in Pure Math. XXXIII, part 2 (1979), American Mathematical Society, 27-61. Borel, A., and H. Jacquet, Automorphic forms and automorphic representations, Automorphic Forms, Representations, and L-functions, Proc. Symposia in Pure Math. XXXIII, part 1 (1979), American Mathematical Society, 189-202. 409
410 REFERENCES Borevich, Z. I., and I. R. Shafarevich, Number Theory, Academic Press, New York, 1966. Brieskorn, E., and H. Knorrer, Plane Algebraic Curves, Birkhauser Verlag, Basel Boston Stuttgart, 1986. Brumer, A., and O. McGuinness, The behavior of the Mordell-Weil group of elliptic curves, Bull. Amer. Math. Soc. 23 (1990), 375- 382. Carayol, H., Sur les representations /-adiques associees aux formes modulaires de Hilbert, Ann. Scient. Ecole Norm. Sup. 19 (1986), 409-468. Cassels, J. W. S., Review of Yu I. Manin's, uOn cubic congruences to a prime modulus," Math. Reviews 18 (1957), 380-381. Coates, J., An effective p-adic analogue of a theorem of Thue II, Acta Arithmetica 16 (1970), 399-412. Charlap, L. S., and D. P. Robbins, An elementary introduction to elliptic curves, preprint, 1988. Chow, W., On compact complex analytic varieties, Amer. J. Math. 71 (1949), 893-914. Chow, W., The Jacobian variety of an algebraic curve, Amer. J. Math. 76 (1954), 453-476. Clozel, L., Motifs et formes automorphes: applications du principe de fonctorialite, Automorphic Forms, Shimura Varieties, and L-functions I: Proceedings of a Conference Held at the Univesity of Michigan, Ann Arbor, July 6-16, J988, , Perspectives in Mathematics 10 (1990), Academic Press Inc., San Diego, 77-159. Cox, D. A., The arithmetic-geometric mean of Gauss, L'Enseignement Math. 30 (1984), 275-330. Deligne, P., Formes modulaires et representations de GL(2), Modular Functions of One Variable II, Lecture Notes in Math. 349 (1973), Springer-Verlag, Berlin Heidelberg New York, 55-105. Deligne, P., and M. Rapoport, Les schemas de modules de courbes elliptiques, Modular Functions of One Variable II, Lecture Notes in Math. 349 (1973), Springer-Verlag, Berlin Heidelberg New York, 143-316. Deuring, M., Die Zetafunktion einer algebraischen Kurve vom Geschechte Eins I, II, III, IV, Nachr. Akad. Wiss. Gottingen (1953) 85-94, (1955) 13-42, (1956) 37-76, (1957) 55-80. Eichler, M., Quaternare quadratische Formen und die Riemannsche Vermutung fur die Kongruenzzetafunktionen, Arch, der Math. 5 (1954), 355-366. Eichler, M., Eine Verallgemeinerung der Abelschen Integrale, Math. Zeitschr. 67 (1957), 267-298.
REFERENCES 411 Eichler, M., The basis problem for modular forms and the traces of the Hecke operators, Modular Functions of One Variable I, Lecture Notes in Math. 320 (1973), Springer-Verlag, Berlin Heidelberg New York, 75-151. Farkas, H. M., and I. Kra, Riemann Surfaces, Springer-Verlag, New York Heidelberg Berlin, 1980. Flath, D., Decomposition of representations into tensor products, Automorphic Forms, Representations, and L-functions, Proc. Symposia in Pure Math. XXXIII, part 1 (1979), American Mathematical Society 179-183. Frey, G., Links between stable elliptic curves and certain Diophantine equations, Annales Universitatis Saraviensis, Series Mathematicae, 1 (1986), 1-40. Fueter, R., Ueber kubische diophantische Gleichungen, Comm. Math. Helv. 2 (1930), 69-89. Fulton, W., Algebraic Curves, Addison-Wesley, Redwood City, California, 1989. Gelbart, S. S., Automorphic Forms on Adele Groups, Annals of Math. Studies 83, Princeton University Press, Princeton, 1975. Gelbart, S. S., An elementary introduction to the Langlands program, Bull Amer. Math Soc. 10 (1984), 177-219. Gelbart, S., and F. Shahidi, Analytic Properties of Automorphic L-Functionsy Perspectives in Mathematics 6 (1988), Academic Press Inc., San Diego. Gelfand, I. M., M. I. Graev, and I. I. Pyatetskii-Shapiro, Representation Theory and Automorphic Functions, W. B. Saunders Co., Philadelphia, 1969. Godement, R., and H. Jacquet, Zeta Functions of Simple Algebras, Lecture Notes in Math. 260 (1972), Springer-Verlag, Berlin Heidelberg New York. Gunning, R. C, Lectures on Modular Forms, Annals of Math. Studies 48, Princeton University Press, Princeton, 1962. Hartshorne, R.; Algebraic Geometry, Springer-Verlag, New York Heidelberg Berlin, 1977. Hasse, H., Zur Theorie der abstrakten elliptischen Funktionkorper I, II, III, J. Reine Angew.. Math. 175 (1936), 55-62, 69-88, 193-208. Hecke, E., Lectures on the Theory of Algebraic Numbers, Springer- Verlag, New York Heidelberg Berlin, 1981. Hijikata, H., Explicit formula of the traces of Hecke operators for To(A^), J. Math. Soc. Japan 26 (1974), 56-82. Husemoller, D., Elliptic Curves, Springer-Verlag, New York Berlin Heidelberg, 1987.
412 REFERENCES Igusa, J., Kroneckerian models of fields of elliptic modular functions, Amer. J. Math. 81 (1959), 561-577. Ireland, K., and M. Rosen, A Classical Introduction to Modern Number Theory, Springer-Verlag, New York Berlin Heidelberg, 2nd edition, 1990. Jacobson, N., The Theory of Rings, Mathematical Surveys II (1943), American Mathematical Society. Jacquet, H., and R. P. Langlands, Automorphic Forms on GL(2), Lecture Notes in Math. 114 (1970), Springer-Verlag, Berlin Heidelberg New York. Katz, N. M.; An overview of Deligne's proof of the Riemann hypothesis for varieties over finite fields, Mathematical Developments Arising from Hilbert Problems, Proc. Symposia in Pure Math. XXVIII (1976), American Mathematical Society, 275-305. Knapp, A. W., Representation Theory of Semisimple Groups: An Overview Based on Examples, Princeton University Press, Princeton, 1986. Koblitz, N., Introduction to Elliptic Curves and Modular Forms, Springer-Verlag, New York Berlin Heidelberg, 1984. Lang, S., Introduction to Modular Forms, Springer-Verlag, Berlin Heidelberg New York, 1976. Lang, S., Abelian Varieties, Springer-Verlag, New York Berlin Heidelberg, 1983. Lang, S., Elliptic Functions, Springer-Verlag, New York Berlin Heidelberg, 2nd edition, 1987. Lang, S., Old and new conjectured Diophantine inequalities, Bull. Amer. Math. Soc. 23 (1990), 37-75. Langlands, R. P., Problems in the theory of automorphic forms, Lectures in Modern Analysis and Applications III, Lecture Notes in Math. 170 (1970), Springer-Verlag, Berlin Heidelberg New York, 18-61. Langlands, R. P., Modular forms and /-adic representations, Modular Functions of One Variable II, Lecture Notes in Math. 349 (1973), Springer-Verlag, Berlin Heidelberg New York, 361-500. Langlands, R. P., Automorphic representations, Shimura varieties, and motives. Ein Marchen, Automorphic Forms, Representations, and L-functions, Proc. Symposia in Pure Math. XXXIII, part 2 (1979), American Mathematical Society, 205-246. Langlands, R. P., Representation theory: its rise and its role in number theory, Proceedings of the Gibbs Symposium, Yale University, May 15-17, 1989, American Mathematical Society, 1990, 181-210.
REFERENCES 413 Lefschetz, S., On certain numerical invariants of algebraic varieties with application to Abelian varieties, Trans. Amer. Math. Soc. 22 (1921), 327-482. Li, W. W., Newforms and functional equations, Math. Annalen 212 (1975), 285-315. Lichtenbaum, S., Torsion Subgroups of Elliptic Curves, Undergraduate Honors Thesis, Harvard University, Cambridge, Massachusetts, 1960. Ligozat, G., Courbes modulaires de genre 1, Bull. Soc. Math. France, Supplement, Memoire 43 (1975). Lutz, E., Sur l'equation y2 = x3 — Ax — B dans les corps p-adiques, J. Reine Angew. Math. 177 (1937), 238-247. Manin, Yu I., On cubic congruences to a prime modulus, Izv. Akad. Nauk SSSR, Ser. Mat. 20 (1956), 673-678 (Russian), Amer. Math. Soc. Transl. (2) 13 (I960), 1-7. Mazur, B., Courbes elliptiques et symboles modulaires, Seminaire Bourbaki, vol. 1971/72, Exposees 400-417, Lecture Notes in Math. 317 (1973), Springer-Verlag, Berlin Heidelberg New York, 277-294. Mazur, B., Modular curves and the Eisenstein ideal, I.H.E.S. Publ. Math. 47 (1977), 33-186. Mazur, B., Rational isogenics of prime degree, Invent. Math. 44 (1978), 129-162. Mazur, B., and P. Swinnerton-Dyer, Arithmetic of Weil Curves, Invent. Math. 25 (1974), 1-61. Mestre, J.-F., and J. Oesterle, Courbes de Weil semi-stables de discriminant une puissance m-ieme, J. Reine Angew. Math. 400 (1989), 173-184. Miyake, T., Modular Forms, Springer-Verlag, Berlin Heidelberg New York, 1989. Mordell, L. J., On the rational solutions of the indeterminate equations of the third and fourth degrees, Proc. Cambridge Philos. Soc. 21 (1922-23), 179-192. Nagell, T., Solution de quelques problemes dans la theorie arithmetique des cubiques planes du premier genre, Skrifter Norske Videnskaps-Akademi i Oslo, 1935, No. 1, pp. 1-25. Neron, A., Modeles minimaux des varietes abeliennes sur les corps locaux et globaux, LH.E.S. Publ. Math. 21 (1964), 5-128. Newman, M., Construction and application of a class of modular functions II, Proc. London Math. Soc. 9 (1959), 373-387. Ogg, A. P., Elliptic curves and wild ramification, Amer. J. Math. 89 (1967), 1-21.
414 REFERENCES Ogg, A., Modular Forms and Dirichlet Series, W. A. Benjamin Inc., New York, 1969a. Ogg, A., On the eigenvalues of Hecke operators, Math. Annalen, 179 (1969b), 101-108. Ogg, A., Survey of modular functions of one variable, Modular Functions of One Variable I, Lecture Notes in Math. 320 (1973), Springer- Verlag, Berlin Heidelberg New York, 1-35. Patterson, S. J., Automorphic forms and number theory, Ecole Europeenne de Theorie des Groupes, CI.R.M. Marseille-Lurniny, preprint, 1991. Piatetski-Shapiro, I., Classical and adelic automorphic forms. An introduction, Automorphic Forms, Representations, and L-functions, Proc. Symposia in Pure Math. XXXIII, part 1 (1979), American Mathematical Society, 185-188. Rademacher, H., Uber die Erzeugenden von Kongruenzuntergruppen der Modulgruppe, Abhandlungen Math. Seminar Hamburg 7 (1930), 134-148. Ribet, K. A., On modular representations of Gal(Q/Q) arising from modular forms, Invent. Math. 100 (1990), 431-476. Robert, A., Formes automorphes sur GL2, Siminaire Bourbaki, vol. 1971/72, Exposees 400-417, Lecture Notes in Math. 317 (1973), Springer-Verlag, Berlin Heidelberg New York, 295-318. Selberg, A., Automorphic functions and integral operators, Seminars on analytic functions. Institute for Advanced Study, Princeton, N. J., and U. S. Air Force Office of Scientific Research, 2 (1957), 152-161. Selmer, E. S., The diophantine equation ax3-f by3 -f cz3 = 0, Acta Math. 85 (1951), 203-362, and 92 (1954), 191-197. Serre, J.-P., Abelian l-adic Representations and Elliptic Curves, W. A. Benjamin Inc., New York, 1968. Serre, J.-P., A course in Arithmetic, Springer-Verlag, New York Heidelberg Berlin, 1973a. Serre, J.-P., Congruences et formes modulaires, Seminaire Bourbaki, vol. 1971/72, Exposees 400-417, Lecture Notes in Math. 317 (1973b), Springer-Verlag, Berlin Heidelberg New York, 319-338. Serre, J.-P., Lettre a J.-F. Mestre, Current Trends in Arithmetical Algebraic Geometry, Contemporary Mathematics Series 67 (1987a), American Mathematical Society, Providence, 263-268. Serre, J.-P., Sur les representations modulaires de degre 2 de Gal(Q/Q), Duke Math. J. 54 (1987b), 179-230. Serre, J.-P., Lectures on the Mordell-Weil Theorem, Fried. Vieweg k, Sohn, Braunschweig, 2nd edition, 1990.
REFERENCES 415 Shahidi, F., Automorphic /^-functions: a survey, Automorphic Forms, Shimura Varieties, and L-functions I: Proceedings of a Conference Held at the Univesity of Michigan, Ann Arbor, July 6-16, 1988, , Perspectives in Mathematics 10 (1990), Academic Press Inc., San Diego, 415-437. Shimura, G., Reduction of algebraic varieties with respect to a discrete valuation of the basic field, Amer. J. Math. 77 (1955), 134-176. Shimura, G., Correspondances modulaires et les fonctions C de courbes algebriques, J. Math. Soc. Japan 10 (1958), 1-28. Shimura, G., Sur les integrates attachees aux formes automorphes, J. Math. Soc. Japan 11 (1959), 291-311. Shimura, G., Introduction to the Arithmetic Theory of Automorphic Functions, Princeton University Press, Princeton, 1971a. Shimura, G., On elliptic curves with complex multiplication as factors of the jacobians of modular function fields, Nagoya Math. J. 43 (1971b), 199-208. Shimura, G., Complex multiplication, Modular Functions of One Variable I, Lecture Notes in Math. 320 (1973a), Springer-Verlag, Berlin Heidelberg New York, 37-56. Shimura, G., On the factors of the jacobian variety of a modular function field, J. Math. Soc. Japan 25 (1973b), 523-544. Shimura, G., and Y. Taniyama, Complex Multiplication of Abelian Varieties and Its Applications to Number Theory, Publ. Math. Soc. Japan, no. 6, 1961, Kenkyusha Printing Co., Tokyo. Siegel, C. L., A simple proof of t;(-1/t) = Tj(T)y/rJi, Mathematika 1 (1954), p. 4. Siegel, C. L., Topics in Complex Function Theory, vol. 1, John Wiley & Sons, New York, 1969. Silberger, A. J., Introduction to Harmonic Analysis on Reductive p-Adic Groups, Princeton University Press, Princeton, 1979. Silverman, J. H., The Arithmetic of Elliptic Curves, Springer-Verlag, New York Berlin Heidelberg, 1986. Stein, E. M., and G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces, Princeton University Press, Princeton, 1971. Swinnerton-Dyer, H. P. F., and B. J. Birch, Elliptic curves and modular functions, Modular Functions of One Variable IV, Lecture Notes in Math. 476 (1975), Springer-Verlag, Berlin Heidelberg New York, 2-32. Swinnerton-Dyer, H. P. F., N. M. Stephens, J. Davenport, J. Velu, F. B. Coghlan, A. O. L. Atkin,, D. J. Tingley, Numerical tables on elliptic curves, Modular Functions of One Variable IV, Lecture
416 REFERENCES Notes in Math. 476 (1975), Springer-Verlag, Berlin Heidelberg New York, 74-144. Taniyama, Y., L-functions of number fields and zeta functions of abelian varieties, J. Math. Soc. Japan 9 (1957), 330-366. Tate, J. T., Fourier analysis in number fields and Hecke's zeta functions, in J. W. S. Cassels and A. Frohlich, Algebraic Number Theory, Academic Press, London, 1967, 305-347. Tate, J. T., The arithmetic of elliptic curves, Invent Math. 23 (1974), 179-206. Tate, J., Algorithm for determining the type of a singular fiber in an elliptic pencil, Modular Functions of One Variable IV, Lecture Notes in Math. 476 (1975), Springer-Verlag, Berlin Heidelberg New York, 33-52. Tate, J., Number theoretic background, Auiomorphic Forms, Representations, and L-functions, Proc. Symposia in Pure Math. XXXIII, part 2 (1979), American Mathematical Society, 3-26. Titchmarsh, E. C, The Theory of the Riemann Zeta-Function, Oxford University Press, Oxford, 1951. Velu, J., Courbes elliptiques sur Q ayant bonne reduction en dehors de {11}, C. R. Acad. Sci. Paris, Ser. A 273 (1971a), 73-75. Velu, J., Isogenics entre courbes elliptiques, C. R. Acad. Sci. Paris, Ser. A 273 (1971b), 238-241. Velu, J., Review of "Numerical tables on elliptic curves/' Math. Reviews 52 (1976), #10557. Walker, R. J., Algebraic Curves, Dover Publications Inc., New York, 1950. Weil, A., Sur un theoreme de Mordell, Bull. Sci. Math. 54 (1930), 182- 191. Weil, A., Foundations of Algebraic Geometry, Amer. Math. Soc. Colloquium Publications XXIX, American Mathematical Society, 1946; 2nd edition, 1962. Weil, A., Uber die Bestimmung Dirichletscher Reihen durch Funktionalgleichungen, Math. Annalen 168 (1967), 149-156. Weil, A., Courbes Algebriques et Varietes Abeliennes, Hermann, Paris, 1971a. (Reprint of volumes 1041 and 1064 of Actualites Scientifiques et Industrielles, 1948.) Weil, A., Dirichlet Series and Automorphic Forms, Lecture Notes in Math. 189 (1971b), Springer-Verlag, Berlin Heidelberg New York. Zagier, D. B., L-series of elliptic curves, the Birch-Swinnerton-Dyer conjecture, and the class number problem of Gauss, Notices Amer. Math. Soc. 31 (1984), 739-743.
REFERENCES 417 Zagier, D. B., Modular parametrizations of elliptic curves, Canad. Math. Bull. 28 (1985), 372-384. Zagier, D., Elliptic Curves, lecture series, Tata Institute for Fundamental Research, Bombay, January 1988. Zagier, D., Introduction to modular forms, preprint, 1990. Zariski, O., and P. Samuel, Commutative Algebra, vol. I, D. Van Nostrand Company Inc., Princeton, 1958. Zariski, O., and P. Samuel, Commutative Algebra, vol. II, D. Van Nostrand Company Inc., Princeton, 1960.
INDEX OF NOTATION See also the list of Standard Notation on page xv. In the list below, Latin, German, and script letters appear together and are followed by Greek symbols and non-letters. <*i, <*2> 03, 04, ae, 42, 56, 57 ap, 294, 295 A(T), 331 i4*(r), 283 Ak(r0(N)), 333 f>2, &4, h, 681 57 cj,329 C4, c6, 57 c„, 225 c(m,X), 214, 387 C, 273 C(*,y,u»),37 C*,353 d, 59 deg, 317, 359 deg,, deg,, 359 df, 371 dt(n), 372 Biv(X), 316, 360 Divo(X), 316, 361 e, 170 eF(x), 359 £(*), 25 £(Q)> 15, 80 £(Q)u™, 130 F(n)(Q), 137, 138 End(E), 364,368 Endq(J), 374 Ep, 130, 135 /, 210 /, 243 /i(*,y), 26 ///, 345 /r, 212 /«., 314 f(r), 330 F, 365 F*, 349, 360 F., 360, 392 F(lfc), 25 F(t), 302 Fp, 135 F*, 28 9, 272, 317, 361 92, 02(A), 158, 222 03, 01(A), 158, 222 *2(r),*3(r),G2*(r), 223 G2(t), 266 G^2)(r), 266 G»,Ga*(A), 158, 222 ft, 97, 263 /»o, 95 H, 30 /MR). #!(€), 326 /^(R), i/*(C), 327 H1(X0(N),I),Z23 Hom(Ei,E2), 364, 368 ft, 223 ft*, 311 t(P, L, F), 32 /, 341 I(V), 342 /(V/*o), 342 /(W), 344 /(Wnib"), 345 i, 65, 333 419
420 INDEX OF NOTATION i(A), 222 otdx(D), 316, 360 j(t), 223, 333 O, 11, 67, 74, 362, 368 jN, 335 p, 151, 153 j, 371 -P, 12, 74 J, 371 P2(fc), 19 J(X), 318, 369 Pn\k), 344 J(Xo(7V)), 310 PA, 273 k[x,y,w)d, 24 Pic(X), 316, 360 M^], *M> 342 Pic°(C), 361 ko(V), k(V), 342 P + Q, 11,67 ibo(M^), Jb(W), 346 P • Q, 10, 43 JfelV]., 343 PQ, 10, 43 *[W], 346 q, 225 ifc[W]x,346 ?ft,263 K(X), 312 qN, 263 L, 19 Q, 285 Z-(Jb), 19 r, 102 L(s), 17 rp, 130, 135 L(s,f), 238, 267, 386 fl, 227 L(s,/,X),387 ft, 251 !(«,£?), 295 R(f,g), 45 £(*,x),201 [«(/,</)], 45 L(D), 316,361 fl(n), 248, 278 L(T), 329 flB/Q, 106 £,*, 20 Pyv,260 mx, 343 72, 170, 271, 305, 375 H.365 K\ 172, 271, 305 M(a, b), 186 5,228 M(n), 244 Sfc, 234 M(n, TV), 276, 320 5fc(r0(7V)), 265 M*(n), 256 5J(r0(JV)), 270 Mk, 234 S^ld(r0(/V)), 283 Mfc(r0(JV)), 265 5few(r0(7V)), 283 Mx, 344 S, 375 Mfl, 351 ^(R1), 211 n(C), 170 <(n), 372 n,,n2, 108 T, 228, 305 N, 390 T(n), 244, 275, 320, 321, 323 N(p), 16 T#(n), 372 ordx(/), 316,350 Tk(n), 244, 245, 275, 277
INDEX OF NOTATION 421 fp, 384 T.375 u, 63, 290, 293 U, 375 v, 351 Voo, Vi, vp, vT, 231 V, 342 V7*o, 342 V(to), 342 V4,V6, 305 V/, 341 w(C), 174 w(Q), 285 iutv, 269 wq, 285 W, 344 W(*o)> 344 WIt 344 Xo(ll), 305 A-0(7V), 310,311 q#, 253,261 a,, 244, 277, 320, 372 qn, 269 [a]*, 245, 261 [ll 323 r(AT), 250, 256 T0(N), 256 ro(r.p), 287 Ti, r2, 170 fi.fj, 177 ra, r6, rc, 170 rab, 321 ra, r^, r7,177 6(g,r), 229, 261 6f, 371 6<(n), 373 A, 58 A(A), 222 A(r), 223 <(«), 189 *?(r), 235 0(r), 208 0(r,X),216 A, 153, 223, 371 A(s), 18, 209 A(s,/), 240, 270, 386 A(«,/,x),387 A(s, E), 386, 387 h{s,E,X), 387 A(s,x),216 A;, 374 AT,223 (A, C7), 273 /i, 272, 371 /<«, A*2> /*3, 272 i/, 374 jt, 370 n, 153 />, 208, 375 <r, 208 <r,(n), 225 t = p + iff, 208, 224 r0, 302 *, 360, 384 ^,384 <p, 272, 357, 380 <p(r), 239, 268 ¥>a,90 $, 20, 318, 345, 369 $,370 $#, 372 */(7), 303 *N(X), 335 X.201 X*. 213 Xo, 201 w, 314, 370, 374 uuu3, 153,223,274 fi(X), 312 QhoiPO, 313, 370
422 INDEX OF NOTATION oo, 79, 231, 261 ||,90 (•)", 210, 365 (•)*, 328 (•,•), 242, 280, 326,371 [•], 323, 365 [•]*, 245, 261 I • \P, 134 (- J , 272, 298, 393 H .83
INDEX Abel's Theorem, 318 abelian subvariety, 368 variety, 367 addition, 162, 363 elliptic curve, 11 formula, 76 adeles, 407 admissible change of variables, 63, 363 affine algebraic set, 342 coordinate ring, 342 curve, 25, 343 local coordinates, 21, 345 problem, 3 variety, 342 algebraic integer, 122, 123 algebraic numbers, 122 algebraic set afnne, 342 projective, 344 analytic continuation, 166, 167 arithmetic-geometric mean, 185, 402 associate Dirichlet characters, 213 associativity, 12, 67 Atkin-Lehner Theorem, 283, 289 automorphic function, 333 automorphic L function, xi, 208 bad prime, 108 Bezout's Theorem, 27, 47 birational map 348 Birch and Swinnerton-Dyer Conjecture, 17, 208, 386 canonical class, 316, 361 canonical height, 95, 97 canonical Q structure, 333, 341, 349, 357 Carayol, 391 change of variables, admissible, 63, 362 character, 200 chord case, 75 chord-tangent composition rule, 11, 43 class group divisor, 316, 361 ideal, 127 class number, 126 complex multiplication, 164, 407 complex torus, 160, 183, 370 conductor, 213, 390 congruence subgroup, 250, 256 congruent numbers, 52, 110, 112, 115 conic, 25 continuation, analytic, 166, 167 convergent product, 195 coordinate ring, 342 coordinates, affine local, 345 correspondence, 385 cubic, 25, 40, 56 nonsingular, 58 singular, 58 curve affine, 25, 343 elliptic, 42 nonsingular, 27 projective, 345 projective plane, 24 same, 24 singular, 27 smooth, 27 cusp, 77, 262 cusp form, 225, 261 defined at, 344, 346, 348 denned over, 342, 344, 347, 361, 362, 368 degenerate, 68 degree, 243, 249, 273, 317, 359, 360 descent, 80 differential holomorphic, 312 invariant, 314 meromorphic, 312 423
424 INDEX dimension, 343, 346 dimension formula, 235, 272 Diophantus, 3, 6, 7, 10, 50 Diophantus method, 6, 10, 115, 119 Dirichlet character, 201 associate, 213 conductor of, 213 extension, 213 primitive, 213 principal, 201 Dirichlet L function, xii, 201 Dirichlet series, 192 Dirichlet Unit Theorem, 125 Dirichlet's Theorem, xii, 148, 189 discrete valuation ring, 351 discriminant, 15, 58, 59, 125, 226 divisor, 316, 360 divisor class group, 316, 361 divisor, principal, 316, 360 dominant morphism, 349 dominate, 392 Double Series Theorem, 158 doubling formula, 76 doubly periodic, 152 dual isogeny, 309, 365 Eichler-Shimura theory, xii, 302, 374, 389, 390, 404 eigenform, 280 elliptic curve, 13, 42, 160, 183, 362 modular, 221 elliptic element, 303 elliptic function, 152 elliptic integral, 174 elliptic regulator, 106 equivalent eigenform, 280 equivalent ideals, 126 Euler product, 196, 202, 255, 282, 289, 295 first degree, 197 second degree, 199 even total winding number, 170 extension of Dirichlet character, 213 fairly bad prime, 108 Fexmat, 52, 110, 112 Fermat's Last Theorem, xii, 3, 18, 50, 54, 58, 81, 397, 399 Fermat's method of descent, 80 Fermat's problem for Mersenne, 55, 119 Fibonacci, 52 first degree Euler product, 197 flex, 35 Fourier inversion formula, 200, 210 Fourier transform, 200, 210 Frey-Serre-Ribet, xii, 18, 399 Frobenius morphism, 360, 384 glh power, 360 function element, 166 function field, 312, 319, 342, 346 of dimension 1, 349 functional equation, 209, 216, 240, 270, 289, 386, 387, 388 fundamental domain, 228, 260 fundamental parallelogram, 152 Gauss, 185, 402 Gauss sum, 214, 387 genus, 272, 317, 361, 395 global minimal, 290 good prime, 108 Hasse Principle, 5, 15, 16 Hasse's Theorem, 16, 296 Hasse-Minkowski Theorem, 5 Hasse-Weil Conjecture, 18, 387 Hasse-Weil L function, 403 Hecke operator, 244, 275, 320, 321, 323, 372 Hecke subgroup, 256 Hecke's Theorem, 249, 279 Hecke-Petersson Theorem, 255, 282 height, 95, 97 Hessian matrix, 30 Hilbert Basis Theorem, 342 Hilbert Nullstellensatz, 342 holomorphic at oo, 225 holomorphic at the cusp, 261, 263 holomorphic differential, 312 homogeneous, 243, 273 homogeneous ideal, 344
INDEX 425 homogeneous polynomial, 22 homology, 317, 321, 392 homomorphism, 368 ideal class group, 127 ideal of variety, 342, 344 identity, elliptic curve, 11 Igusa, 374, 390 inflection point, 35 integral, elliptic, 174 intersection multiplicity, 32 invariant differential, 314 inversion, 269, 285 inversion problem, 165 irreducible, 342, 345 isogenous, 365, 369 isogeny, 164, 309, 363, 369 dual, 309, 369 isomorphic elliptic curves, 63 isomorphic over, 349 isomorphic varieties, 349 isomorphism, 368 j-invariant, 65, 226 Jacobi Inversion Theorem, 319 Jacobian variety, 310, 318, 361, 369, 370 L function, xi, 17, 201, 238, 267, 295 automorphic, xi, 208 Dirichlet, xii, 201 motivic, xi, 207 Langlands program, xiii lattice, 243, 318 Legendre symbol, 272, 298, 393 Legendre's Theorem, 5 level, 261, 333, 390 Lie group, 376 line, 19, 25 at infinity, 19 same, 19 tangent, 27 linear system, 316, 361 Liouville Theorems, 152, 153 local coordinates, affine, 21, 345 local L factor, 294 local ring, 343, 346 Lutz-Nagell Theorem, xi, 15, 130, 144 Mazur's Theorem, 15 Mellin transform, 238 meromorphic differential, 312 Mestre-Oesterle, 396 method of descent, 80 method of Diophantus, 6, 10, 115, 119 minimal Weierstrass equation, 290 at prime p, 290 global, 290 Minkowski, 5, 104 model, 333, 341, 349, 357 modular elliptic curve, 221 modular form, 224, 225, 261 unrestricted, 224, 261 modular pair, 273 modular parametrization, 310, 390 modular polynomial, 335 Mordell's Theorem, xi, 14, 80, 95, 402 Mordell-Weil Theorem, xi, 80, 95, 402 morphism, 348, 356 dominant, 349 Frobenius, 360, 384 motivic L function, xi, 207 multiplicative, 196 strictly, 197 Multiplicity One Theorem, 283 multiplicity, intersection, 32 Nagell, xi, 15, 130, 144 naive height, 95 negative, 12 newform, 283 Newton, 10 node, 77 Noetherian, 342 nondegenerate, 68 nonsingular, 25, 27, 58, 161, 343, 347 n on split case, 79 norm, 123 Nullstellensatz, 342 number field, 122 oldform, 283
426 INDEX order, 153 p-adic filtration, 138 p-adic norm, 134 p-integral, 135 p-reduced, 135 parabolic element, 303 parallelogram fundamental, 152 period, 152 period, 151, 184 period lattice, 153 period parallelogram, 152 Petersson inner product, 242, 252, 280 plane curve, 24 Poincare, 13, 67 point nonsingular, 25, 343, 347 of inflection, 35 singular, 25, 77 points, 19, 25, 342, 344 at infinity, 19 Poisson Summation Formula, 211 pole, 360 prescribed torsion, 145 primitive, 213 Principal Axis Theorem, 37 principal Dirichlet character, 201 principal congruence subgroup, 250, 256 principal divisor, 316, 360 product convergent, 195 of ideals, 126 of varieties, 347 projective algebraic set, 344 closure, 345 curve, 345 n-space, 344 plane, 19 plane curve, 24 problem, 3 transformation, 20, 345 variety, 345 purely inseparable, 359 Pythagorean triple, 7 9-expansion, 225, 263, 265 quadratic residue symbol, 272, 298, 393 ramified, 359 rank, 15, 102, 107 rational map, 347 rational points, 19, 25, 342, 344 reciprocal roots, 198 reduction modulo p, 130, 134, 384 regular, 344, 346, 348 regulator, 106 representation theory, xiii, 407 result of rearranging, 167 resultant, 44 Ribet, xii, 18, 398, 399 Riemann-Roch Theorem, 317, 361 Riemann surface, 170, 311, 312, 316 Riemann zeta function 189, 194 same curve, 24 same line, 19 Schwartz function, 211 second degree Euler product, 199 Segre embedding, 347 Selmer's example, 8 separable, 359 Serre, xii, 18, 392, 399 Shimura, xii, 302, 374, 389, 390, 390, 404 Shimura-Taniyama, 373 singular, 12, 25, 27, 58, 77 smooth curve, 27 split case, 79 square root, 169 strictly multiplicative, 197 strong Weil curve, 392 subvariety, 368 summation by parts, 192 Swinnerton-Dyer, 17, 208, 386 tangent case, 75 tangent line, 27 Taniyama, 373 Taniyama-Weil Conjecture, xii, 18, 208, 221, 390, 399
INDEX 427 Tate normal form, 147 theta function, 209, 216 torsion subgroup, 15, 130 prescribed, 145 torus, complex, 160, 183, 370 total winding number, 170 trace, 123, 396 translation, 364 twist, 308, 393 ultrametric inequality, 134 uniformizer, 350 unique factorization, 92, 127 unit, 92, 124 unramified, 359 unrestricted modular form, 224, 261 valuation, 351 vanishes at the cusp, 263 variety abelian, 367 a/fine, 342 projective, 345 very bad prime, 108 Wedderburn's Theorem, 375 Weierstrass Double Series Theorem, 158 equation, 362 minimal, 290 form, 13, 42, 56, 57, 401 p function, 153 weight, 63, 224, 261, 333 Weil, xii, 18, 208, 221, 387, 390, 399 Weil conjectures, 403 Weil Converse Theorem, 388 Weil curve, 390 strong, 392 width, 263 winding number, total, 170 zero, 360 zeta function, xi, 189, 194, 295
MATHEMATICAL NOTES ted by Luis A. Caffarelli, John N. Mather, and Elias M. Stt~. 22. On Uniformization of Complex Manifolds: The Role of Connections, by R. C. Gunning 23. Introduction to Harmonic Analysis on Reductive P-adic Groups, by Allan J. SlLBERGER 24. Lectures on Pseudo-Differential Operators: Regularity Theorems and Applications to Non-Elliptic Problems, by Alexander Nagel and E. M. Stein 25. Non-Abelian Minimal Closed Ideals of Transitive Lie Algebras, by J. F. Conn 26. Exact Sequences in the Algebraic Theory of Surgery, by Andrew Ranicki 27. Existence and Regularity of Minimal Surfaces on Riemannian Manifolds, by Jon T. Pitts 28. Hardy Spaces on Homogenous Groups, by G. B. Folland and E. M. Stein 29. Lectures on Exponential Decay of Solutions of Second-Order Elliptic Equations: Bounds on Eigenfunctions of N-Body Schrodinger Operators, by Shmuel Agmon 30. Formal Knot Theory, by Louis H. Kauffman 31. Cohomology of Quotients in Symplectic and Algebraic Geometry, by Frances Clare Kirwan 32. Predicative Arithmetic, by Edward Nelson 33. Lectures on Counterexamples in Several Complex Variables, by John Erik Forn/ESS and Berit Stensones 34. Lie Groups, Lie Algebras, and Cohomology, by Anthony W, Knapp 35. Plateau's Problem and the Calculus of Variations, by Michael Struwe 36. Casson's Invariant for Oriented Homology 3-Spheres: An Exposition, by Selman Akbulut and John D. McCarthy 37. Analytic Pseudodifferential Operators for the Heisenberg Group and Local Solvability, by Daryl Geller 38. Several Complex Variables: Proceedings of the Mittag-Leffler Institute, 1987-1988, edited by John E. Forn^ss 39. ©-modules and Spherical Representations, by Frederic V. Bien 40. Elliptic Curves, by Anthony W. Knapp A complete catalogue of Princeton mathematics and science books, with prices, is available upon request. PRINCETON UNIVERSITY PRESS Princeton, New Jersey 08540 9 780691 085593