== PBC internals == The source code is organized by subdirectories: *`include`*: Headers describing the official API. Headers in other places are for internal use only. *`arith`*: Finite fields: modular arithmetic, polynomial rings, and polynomial rings modulo a polynomial. Finite fields of low characteristic are unsupported. *`ecc`*: Elliptic curve generation, elliptic curve groups and pairings. One source file is dedicated to each type of pairing, containing specialized optimizations. Some of the code requires arbitrary precision complex numbers, which also live here but should be moved elsewhere one day. *`misc`*: Dynamic arrays, symbol tables, benchmarking, logging, debugging, other utilities. *`gen`*: Programs that generate pairing parameters and list Hilbert polynomials. These were used to prepare the samples in the `param` directory. *`example`*: Example programs showing how to use the library. *`guru`*: Tests, experimental code. === Groups, rings, fields === Algebraic structures are represented in the +field_t+ data type, which mostly contains pointers to functions written to perform operations such as addition and multiplication in that particular group, ring or field: struct field_s { ... void (*init)(element_ptr); void (*clear)(element_ptr); ... void (*add)(element_ptr, element_ptr, element_ptr); void (*sub)(element_ptr, element_ptr, element_ptr); void (*mul)(element_ptr, element_ptr, element_ptr); ... }; typedef struct field_s *field_ptr; typedef struct field_s field_t[1]; The name +algebraic_structure_t+ is arguably more accurate, but far too cumbersome. It may help if one views groups and rings as handicapped fields. The last two lines of the above code excerpt show how GMP and PBC define data types: they are arrays of length one so that when a variable is declared, space is automatically allocated for it on the stack. Yet when used as a argument to a function, a pointer is passed, thus there is no need to explicitly allocate and deallocate memory, nor reference and dereference variables. Each +element_t+ contains a field named +field+ to such a +field_t+ variable. The only other field is +data+, which stores any data needed for the implementation of the particular algebraic structure the element resides in. struct element_s { struct field_s *field; void *data; }; When an +element_t+ variable is initialized, +field+ is set appropriately, and then the initialization specific to that field is called to complete the initialization. Here, a line of code is worth a thousand words: void element_init(element_t e, field_ptr f) { e->field = f; f->init(e); } Thus during a call to one of the `element_` functions, the +field+ pointer is followed then the appropriate routine is executed. For example, modular addition results when the input element is an element of a finite field, while polynomial addition is performed for elements of a polynomial ring and so on. void element_add(element_t n, element_t a, element_t b) { n->field->add(n, a, b); } My design may seem dangerous because if a programmer inadvertently attempts to add a polynomial and a point on an elliptic curve, say, the code will compile without warnings since they have the same data type. However I settled on having a catch-all ``glorified +void *+'' +element_t+ because I wanted to - extend a field an arbitrary number of times (though in practice, currently I only need to extend a field twice at most), - switch fields easily, so for example a program that benchmarks addition in polynomial rings can be trivially modified to benchmark addition in a group, and - interchange different implementations of the same algebraic structure, for example, compare Montgomery representation versus a naive implementation of integer modulo rings. Additionally, defining `PBC_DEBUG` catches many type mismatches. In mathematics, groups, rings and fields should be distinguished, but for implmentation, it is simplest lump them together under the same heading. In any event, distinct data types may lead to a false sense of security. Fields of prime order with different moduli would still fall under the same data type, with unpleasant results if their elements are mistakenly mixed. I have vague plans to add flags to +field_t+ describing the capabilities of a particular +field_t+. These flags would be set during initialization, and would indicate for example whether one can invert every nonzero element, whether there are one or two operations (that is, group versus ring), whether the field is an integer mod ring, polynomial ring, or polynomial mod ring, and so on. Once in place, more runtime checks can be performed to avoid illegal inversion and similar problems. Another option is to introduce data types for each of the four pairing-related algebraic structures, namely G1, G2, GT and Zr, as these are the only ones needed for implementing pairing-based cryptosystems. An alternative was to simply use +void *+ instead of +element_t+ and require the programmer to pass the field as a parameter, e.g. +element_add(a, b, c, F_13)+, but I decided the added annoyance of having to type this extra variable every time negated any benefits, such as obviating the need for the +field+ pointer in +struct element_s+, even if one ignores the more serious problem that runtime type checking is considerably harder, if not impossible. I suppose one could write a preprocessor to convert one type of notation to the other, but I would like the code to be standard C. (On the other hand, as Hovav Shacham suggested, it may be nice to eventually have a converter that takes human-friendly infix operator expressions like `a = (b + c) * d` and outputs the assembly-like `element_` equivalents.) === Internal randomness === Some algorithms require a quadratic nonresidue in a given field. These are computed lazily: The first time a quadratic nonresidue is requested, one is generated at random, using the same source of random bits as other PBC random functions. [Which reminds me, should I get rid of the +nqr+ field and instead have it as part of the +data+ field in struct field_s?] In `fieldquadratic.c`, a quadratic field extension is constructed with a square root of this randomly generated quadratic nonresidue in the base field. Thus for a nondeterminstic source of random bits, the same field may be constructed differently on different runs. To construct the same field the same way every time, one must record the quadratic nonresidue generated from one run, and call `field_set_nqr()` every time this particular construction of a quadratic field extension is desired. Another use for this function is to save time by setting the quadratic nonresidue to some precomputed value. Similarly, for higher degree extensions, a random irreducible polynomial may be chosen to construct it, but this must be recorded if the same construction is later required. This happens behind the scenes in PBC. === Type A internals === Type A pairings are constructed on the curve y^2^ = x^3^ + x over the field F_q for some prime q = 3 mod 4. Both G1 and G2 are the group of points E(F_q), so this pairing is symmetric. It turns out #E(F_q) = q + 1 and #E(F_q^2^) = (q + 1)^2^. Thus the embedding degree k is 2, and hence GT is a subgroup of F_q^2. The order r is some prime factor of q + 1. Write q + 1 = r * h. For efficiency, r is picked to be a Solinas prime, that is, r has the form 2^a^ +- 2^b^ +- 1 for some integers 0 < b < a. Also, we choose q = -1 mod 12 so F_q^2^ can be implemented as F_q[i] (where i = sqrt(-1)) and since q = -1 mod 3, cube roots in F_q are easy to compute. This latter feature may be removed because I have not found a use for it yet (in which case we only need q = -1 mod 4). +a_param+ struct fields: exp2, exp1, sign1, sign0, r: r = 2^exp2 + sign1 * 2^exp1 + sign0 * 1 (Solinas prime) q, h: r * h = q + 1 q is a prime, h is a multiple of 12 (thus q = -1 mod 12) Type A1 uses the same equation, but have different fields since the library is given r and cannot choose it. +a1_param+ struct fields: p, n, l: p + 1 = n * l p is prime, same as the q in a_param, n is the order of the group. === Type B internals === Unimplemented. Similar to type A. The curve y^2^ = x^3^ + 1 over the field F_q for some prime q = 2 mod 3, which implies cube roots in F_q are easy to compute, though we can achieve this for type A pairings by constraining q appropriately. I recommend requiring q = 3 mod 4 as well, so that -1 is a quadratic nonresidue. The lack of an x term simplifies some routines such as point doubling. It turns out we must choose between symmetry or efficiency due to the nature of a certain optimization. === Type C internals === Unimplemented. The supersingular curves y^2^ = x^3^ + 2x + 1 and y^2^ = x^3^ + 2x - 1 over a field of characteristic 3. Discussed at length by Boneh, Lynn, and Shacham, "Short signatures from the Weil pairing". Many optimizations can be applied to speed up these pairings; see Barreto et al., "Efficient algorithms for pairing-based cryptosystems", but sadly, an attack due to Coppersmith makes these curves less attractive. === Type D internals === These are ordinary curves of with embedding degree 6, whose orders are prime or a prime multiplied by a small constant. A type D curve is defined over some field F_q and has order h * r where r is a prime and h is a small constant. Over the field F_q^6^ its order is a multiple of r^2^. Typically the order of the curve E is around 170 bits, as is F_q, the base field, thus q^k^ is around the 1024-bit mark which is commonly considered good enough. +d_param+ struct fields: q F_q is the base field n # of points in E(F_q) r large prime dividing n h n = h * r a E: y^2 = x^3 + ax + b b nk # of points in E(F_q^k) hk nk = hk * r * r coeff0 coefficients of a monic cubic irreducible over F_q coeff1 coeff2 nqr quadratic nonresidue in F_q These were discovered by Miyaji, Nakabayashi and Takano, "New explicit conditions of elliptic curve traces for FR-reduction". === Type E Internals === The CM (Complex Multiplication) method of constructing elliptic curves starts with the Diophantine equation DV^2 = 4q - t^2 If t = 2 and q = D r^2^ h^2^ + 1 for some prime r (which we choose to be a Solinas prime) and some integer h, we find that this equation is easily solved with V = 2rh. Thus it is easy to find a curve (over the field F_q) with order q - 1. Note r^2^ divides q - 1, thus we have an embedding degree of 1. Hence all computations necessary for the pairing can be done in F_q alone. There is never any need to extend F_q. As q is typically 1024 bits, group elements take a lot of space to represent. Moreover, many optimizations do not apply to this type, resulting in a slower pairing. +e_param+ struct fields: exp2, exp1, sign1, sign0, r: r = 2^exp2 + sign1 * 2^exp1 + sign0 * 1 (Solinas prime) q, h q = h r^2 + 1 where r is prime, and h is 28 times a perfect square a, b E: y^2 = x^3 + ax + b === Type F internals === Using carefully crafted polynomials, k = 12 pairings can be constructed. Only 160 bits are needed to represent elements of one group, and 320 bits for the other. Also, embedding degree k = 12 allows higher security short signatures. (k = 6 curves cannot be used to scale security from 160-bits to say 256-bits because finite field attacks are subexponential.) +f_param+ struct fields: q: The curve is defined over Fq r: The order of the curve. b: E: y^2= x^3 + b beta: A quadratic nonresidue in Fq: used in quadratic extension. alpha0, alpha1: x^6 + alpha0 + alpha1 sqrt(beta) is irreducible: used in sextic extension. Discovered by Barreto and Naehrig, "Pairing-friendly elliptic curves of prime order". === Type G Internals === Another construction based on the CM method. +g_param+ struct fields: q, n, h, r: h * r = n is the order of E(F_q) a, b: E: y^2 = x^3 + ax + b nk: #E(F_q^10) hk: hk * r^2 = nk coeff: array of coefficients of polynomial used for quintic extension. nqr: a quadratic nonresidue +g_param+ struct fields: Discovered by Freeman, "Constructing pairing-friendly elliptic curves with embedding degree 10." === Type I Internals === Type I pairings is symmetric, constructed on a supersingular curve y^2^ = x^3^ - x + 1 over a ternary extension field F_{3^m^}. The embedding degree k is 6. Both G1 and G2 are the group of points E(F_{3^m^}). GT is a subgroup of F_{3^6*m^}. The group order is a prime number. parameters: m, t: The ternary extension field is F(3)[x]/(x^m^ + x^t^ + 2). n: the order of G1 n2: n * n2 = number of points in E(F_{3^m^}) Introduced by Barreto et al, "Efficient Pairing Computation on Supersingular Abelian Varieties", Designs, Codes and Cryptography, vol. 42, no. 3, pp. 239-271, Mar. 2007. === Testing functions === For testing, debugging, demonstrations and benchmarks. Declared in +pbc_test.h+: include::gen/test.txt[] === Dynamic arrays === The +darray_t+ data type manages an array of pointers of type +void \*+, allocating more memory when necessary. Declared in +pbc_darray.h+. include::gen/darray.txt[] === Symbol tables === The +symtab_t+ data type manages symbol tables where the keys are strings of type +char \*+ and the values are pointers of type +void \*+. At present, they are implemented inefficiently using dynamic arrays, but this will change if the need arises. They are only used when reading a +pbc_param_t+ from a string. Declared in +pbc_symtab.h+. include::gen/symtab.txt[] === Religious stances === I chose C because: - GMP, which PBC requires and is also modeled on, is also written in C. - PBC is intended to be a low-level portable cryptographic library. C is the least common denominator. It should not be difficult to wrap PBC for other languages. - Despite its drawbacks (I would appreciate operator overloading and genericity, and to a lesser extent garbage collection), I've found few languages I like better. To quote Rob Pike, C is the desert island language. (I also agree with his statement that OO languages conceptually provide little extra over judicious use of function pointers in C.) With respect to indentation, I'm migrating the code to follow http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml[Google C++ Style Guide] to avoid having to switch styles all the time. The code was originally written using my old style: 4-space indent with 1TBS (One True Brace Style). I'd like to have no library dependencies (except standard C libraries), but then I'd have to write a large integer library. Furthermore, I'd have to write it in assembly, and then port it. To avoid this, I use an existing library. I selected GMP because the library's focus is on multiprecision arithmetic and nothing else, and it aims to be as fast as possible on many platforms. Another important factor is that GMP is released under a free license. On the other hand, GMP is written to deal with extremely large numbers, while I mostly only need integers that are roughly between 160 and 2048 bits. It is possible a library specializing in numbers of these sizes would be better for PBC. I'm fond of GMP's method for eliminating the need for the +&+ and +*+ operators most of the time by declaring a typedef on arrays of size 1. I try to do the same with PBC for consistency, though this trick does have drawbacks. I would like to have GMP as the only library dependency, though I do not mind using other libraries so long as they are optional. For example, one of the test programs is much easier to use if compiled with the GNU readline library, but by default compiles without it and is still functional. I dislike the C preprocessor. I like to place platform-specific code in separate files and let the build system work out which one to use. Integer constants can be defined with enum instead. I intend to minimize the number of +#include+ statements in header files for PBC's internal use as much as possible (they should be in the `.c` files instead), and later perhaps even remove those annoying +#ifndef+ statements too. I grudgingly accept some macros for PBC's debugging features. I liberally use nested functions, a GNU C extension. I find their expressiveness so indispensable that I'm willing to sacrifice portability for them. The http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html[GNU libc manual] states that data types ending in +_t+ should not be used because they are reserved for future additions to C or POSIX. On the other hand, I want to stay consistent with GMP, and ending data types with +_t+ is common practice.