== PBC internals ==

The source code is organized by subdirectories:

*`include`*: Headers describing the official API. Headers in other places
are for internal use only.

*`arith`*: Finite fields: modular arithmetic, polynomial rings, and polynomial
rings modulo a polynomial. Finite fields of low characteristic are unsupported.

*`ecc`*: Elliptic curve generation, elliptic curve groups and pairings. One
source file is dedicated to each type of pairing, containing specialized
optimizations. Some of the code requires arbitrary precision complex numbers,
which also live here but should be moved elsewhere one day.

*`misc`*: Dynamic arrays, symbol tables, benchmarking, logging, debugging,
other utilities.

*`gen`*: Programs that generate pairing parameters and list Hilbert
polynomials. These were used to prepare the samples in the `param` directory.

*`example`*: Example programs showing how to use the library.

*`guru`*: Tests, experimental code.

=== Groups, rings, fields ===

Algebraic structures are represented in the +field_t+ data type, which mostly
contains pointers to functions written to perform operations such as addition
and multiplication in that particular group, ring or field:

  struct field_s {
    ...
    void (*init)(element_ptr);
    void (*clear)(element_ptr);
    ...
    void (*add)(element_ptr, element_ptr, element_ptr);
    void (*sub)(element_ptr, element_ptr, element_ptr);
    void (*mul)(element_ptr, element_ptr, element_ptr);
    ...
  };
  typedef struct field_s *field_ptr;
  typedef struct field_s field_t[1];

The name +algebraic_structure_t+ is arguably more accurate, but far too
cumbersome.  It may help if one views groups and rings as handicapped fields.

The last two lines of the above code excerpt show how GMP and PBC define data
types: they are arrays of length one so that when a variable is
declared, space is automatically allocated for it on the stack.
Yet when used as a argument to a function, a pointer is passed, thus there is
no need to explicitly allocate and deallocate memory, nor reference and
dereference variables.

Each +element_t+ contains a field named +field+ to such a +field_t+ variable.
The only other field is +data+, which stores any data needed for the
implementation of the particular algebraic structure the element resides in.

  struct element_s {
    struct field_s *field;
    void *data;
  };

When an +element_t+ variable is initialized, +field+ is set appropriately, and
then the initialization specific to that field is called to complete the
initialization. Here, a line of code is worth a thousand words:

  void element_init(element_t e, field_ptr f) {
    e->field = f;
    f->init(e);
  }

Thus during a call to one of the `element_` functions, the +field+ pointer is
followed then the appropriate routine is executed. For example, modular addition
results when the input element is an element of a finite field, while
polynomial addition is performed for elements of a polynomial ring and so on.

  void element_add(element_t n, element_t a, element_t b) {
    n->field->add(n, a, b);
  }

My design may seem dangerous because if a programmer inadvertently attempts
to add a polynomial and a point on an elliptic curve, say, the code
will compile without warnings since they have the same data type.

However I settled on having a catch-all ``glorified +void *+'' +element_t+
because I wanted to

- extend a field an arbitrary number of times (though in practice, currently I
  only need to extend a field twice at most),
- switch fields easily, so for example a program that benchmarks addition in
  polynomial rings can be trivially modified to benchmark addition in a group,
  and
- interchange different implementations of the same algebraic structure, for
  example, compare Montgomery representation versus a naive implementation of
  integer modulo rings.

Additionally, defining `PBC_DEBUG` catches many type mismatches.

In mathematics, groups, rings and fields should be distinguished, but for
implmentation, it is simplest lump them together under the same heading.
In any event, distinct data types may lead to a false sense of security.
Fields of prime order with different moduli would still fall under the same
data type, with unpleasant results if their elements are mistakenly mixed.

I have vague plans to add flags to +field_t+ describing the capabilities of a
particular +field_t+.  These flags would be set during initialization, and
would indicate for example whether one can invert every nonzero element,
whether there are one or two operations (that is, group versus ring), whether
the field is an integer mod ring, polynomial ring, or polynomial mod ring, and
so on. Once in place, more runtime checks can be performed to avoid illegal
inversion and similar problems.

Another option is to introduce data types for each of the four pairing-related
algebraic structures, namely G1, G2, GT and Zr, as these are the only ones
needed for implementing pairing-based cryptosystems.

An alternative was to simply use +void *+ instead of +element_t+ and require
the programmer to pass the field as a parameter, e.g. +element_add(a, b, c,
F_13)+, but I decided the added annoyance of having to type this extra variable
every time negated any benefits, such as obviating the need for the
+field+ pointer in +struct element_s+, even if one ignores
the more serious problem that runtime type checking is considerably harder, if
not impossible.

I suppose one could write a preprocessor to convert one type of notation
to the other, but I would like the code to be standard C.  (On the other hand,
as Hovav Shacham suggested, it may be nice to eventually have a converter that
takes human-friendly infix operator expressions like `a = (b + c) *
d` and outputs the assembly-like `element_` equivalents.)

=== Internal randomness ===

Some algorithms require a quadratic nonresidue in a given field. These
are computed lazily: The first time a quadratic nonresidue is requested, one is
generated at random, using the same source of random bits as other PBC random
functions. [Which reminds me, should I get rid of the +nqr+ field and instead
have it as part of the +data+ field in struct field_s?]

In `fieldquadratic.c`, a quadratic field extension is constructed with a square
root of this randomly generated quadratic nonresidue in the base field.  Thus
for a nondeterminstic source of random bits, the same field may be constructed
differently on different runs.

To construct the same field the same way every time, one must record the
quadratic nonresidue generated from one run, and call `field_set_nqr()` every
time this particular construction of a quadratic field extension is desired.
Another use for this function is to save time by setting the quadratic
nonresidue to some precomputed value.

Similarly, for higher degree extensions, a random irreducible polynomial
may be chosen to construct it, but this must be recorded if the same
construction is later required.

This happens behind the scenes in PBC.

=== Type A internals ===

Type A pairings are constructed on the curve y^2^ = x^3^ + x over the field F_q
for some prime q = 3 mod 4.
Both G1 and G2 are the group of points E(F_q), so this
pairing is symmetric.  It turns out #E(F_q) = q + 1 and
#E(F_q^2^) = (q + 1)^2^. Thus the embedding degree k is 2,
and hence GT is a subgroup of F_q^2.  The order r is some prime
factor of q + 1.

Write q + 1 = r * h. For efficiency, r is picked to be a Solinas prime,
that is, r has the form 2^a^ +- 2^b^ +- 1 for some integers 0 < b < a.

Also, we choose q = -1 mod 12 so F_q^2^ can be implemented as F_q[i]
(where i = sqrt(-1)) and since q = -1 mod 3, cube roots in F_q
are easy to compute. This latter feature may be removed because I have
not found a use for it yet (in which case we only need q = -1 mod 4).

+a_param+ struct fields:

  exp2, exp1, sign1, sign0, r:
    r = 2^exp2 + sign1 * 2^exp1 + sign0 * 1 (Solinas prime)
  q, h:
    r * h = q + 1
    q is a prime, h is a multiple of 12 (thus q = -1 mod 12)

Type A1 uses the same equation, but have different fields since the library
is given r and cannot choose it.

+a1_param+ struct fields:

  p, n, l:
    p + 1 = n * l
    p is prime, same as the q in a_param, n is the order of the group.

=== Type B internals ===

Unimplemented. Similar to type A. The curve y^2^ = x^3^ + 1 over the field F_q
for some prime q = 2 mod 3, which implies cube roots in F_q are easy to
compute, though we can achieve this for type A pairings by constraining q
appropriately. I recommend requiring q = 3 mod 4 as well, so that -1 is
a quadratic nonresidue.

The lack of an x term simplifies some routines such as point doubling.

It turns out we must choose between symmetry or efficiency due to the nature of
a certain optimization.

=== Type C internals ===

Unimplemented. The supersingular curves y^2^ = x^3^ + 2x + 1 and
y^2^ = x^3^ + 2x - 1 over a field of characteristic 3. Discussed at length
by Boneh, Lynn, and Shacham, "Short signatures from the Weil pairing".
Many optimizations can be applied to speed up these pairings; see
Barreto et al., "Efficient algorithms for pairing-based cryptosystems", but
sadly, an attack due to Coppersmith makes these curves less attractive.

=== Type D internals ===

These are ordinary curves of with embedding degree 6, whose orders are prime
or a prime multiplied by a small constant.

A type D curve is defined over some field F_q and has order h * r where
r is a prime and h is a small constant. Over the field F_q^6^ its order is
a multiple of r^2^.

Typically the order of the curve E is around 170 bits, as is F_q, the base
field, thus q^k^ is around the 1024-bit mark which is commonly considered
good enough.

+d_param+ struct fields:

  q   F_q is the base field
  n   # of points in E(F_q)
  r   large prime dividing n
  h   n = h * r
  a   E: y^2 = x^3 + ax + b
  b
  nk  # of points in E(F_q^k)
  hk  nk = hk * r * r
  coeff0 coefficients of a monic cubic irreducible over F_q
  coeff1
  coeff2
  nqr quadratic nonresidue in F_q

These were discovered by Miyaji, Nakabayashi and Takano,
"New explicit conditions of elliptic curve traces for FR-reduction".

=== Type E Internals ===

The CM (Complex Multiplication) method of constructing elliptic curves
starts with the Diophantine equation

    DV^2 = 4q - t^2

If t = 2 and q = D r^2^ h^2^ + 1 for some prime r (which we choose to
be a Solinas prime) and some integer h, we find that this equation is easily
solved with V = 2rh.

Thus it is easy to find a curve (over the field F_q) with order q - 1.
Note r^2^ divides q - 1, thus we have an embedding degree of 1.

Hence all computations necessary for the pairing can be done in F_q alone.
There is never any need to extend F_q.

As q is typically 1024 bits, group elements take a lot of space to represent.
Moreover, many optimizations do not apply to this type, resulting in a slower
pairing.

+e_param+ struct fields:

  exp2, exp1, sign1, sign0, r:
    r = 2^exp2 + sign1 * 2^exp1 + sign0 * 1 (Solinas prime)
  q, h
    q = h r^2 + 1 where r is prime, and h is 28 times a perfect square
  a, b
    E: y^2 = x^3 + ax + b

=== Type F internals ===

Using carefully crafted polynomials, k = 12 pairings can be constructed.
Only 160 bits are needed to represent elements of one group, and 320 bits
for the other.

Also, embedding degree k = 12 allows higher security short signatures.
(k = 6 curves cannot
be used to scale security from 160-bits to say 256-bits because finite
field attacks are subexponential.)

+f_param+ struct fields:

  q:
    The curve is defined over Fq
  r:
    The order of the curve.
  b:
    E: y^2= x^3 + b
  beta:
    A quadratic nonresidue in Fq: used in quadratic extension.
  alpha0, alpha1:
    x^6 + alpha0 + alpha1 sqrt(beta) is irreducible: used in sextic extension.

Discovered by Barreto and Naehrig, "Pairing-friendly elliptic curves of prime order".

=== Type G Internals ===

Another construction based on the CM method.

+g_param+ struct fields:

  q, n, h, r:
    h * r = n is the order of E(F_q)
  a, b:
    E: y^2 = x^3 + ax + b
  nk:
    #E(F_q^10)
  hk:
    hk * r^2 = nk
  coeff:
    array of coefficients of polynomial used for quintic extension.
  nqr:
    a quadratic nonresidue

+g_param+ struct fields:

Discovered by Freeman, "Constructing pairing-friendly elliptic curves with embedding degree 10."

=== Type I Internals ===

Type I pairings is symmetric, constructed on a supersingular curve
y^2^ = x^3^ - x + 1 over a ternary extension field F_{3^m^}.
The embedding degree k is 6.
Both G1 and G2 are the group of points E(F_{3^m^}).
GT is a subgroup of F_{3^6*m^}. The group order is a prime number.

parameters:

  m, t:
    The ternary extension field is F(3)[x]/(x^m^ + x^t^ + 2).
  n:
    the order of G1
  n2:
    n * n2 = number of points in E(F_{3^m^})

Introduced by Barreto et al, "Efficient Pairing Computation on Supersingular
Abelian Varieties", Designs, Codes and Cryptography, vol. 42, no. 3, pp. 239-271,
Mar. 2007.

=== Testing functions ===

For testing, debugging, demonstrations and benchmarks.
Declared in +pbc_test.h+:

include::gen/test.txt[]

=== Dynamic arrays ===

The +darray_t+ data type manages an array of pointers of type +void \*+,
allocating more memory when necessary.
Declared in +pbc_darray.h+.

include::gen/darray.txt[]

=== Symbol tables ===

The +symtab_t+ data type manages symbol tables where the keys are strings of
type +char \*+ and the values are pointers of type +void \*+.

At present, they are implemented inefficiently using dynamic arrays, but this
will change if the need arises. They are only used when reading a +pbc_param_t+
from a string. Declared in +pbc_symtab.h+.

include::gen/symtab.txt[]

=== Religious stances ===

I chose C because:

- GMP, which PBC requires and is also modeled on, is also written in C.
- PBC is intended to be a low-level portable cryptographic library.  C is the
  least common denominator. It should not be difficult to wrap PBC for other
  languages.
- Despite its drawbacks (I would appreciate operator overloading and
  genericity, and to a lesser extent garbage collection), I've found few
  languages I like better.  To quote Rob Pike, C is the desert island language.
  (I also agree with his statement that OO languages conceptually provide
  little extra over judicious use of function pointers in C.)

With respect to indentation, I'm migrating the code to follow
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml[Google C++
Style Guide] to avoid having to switch styles all the time.
The code was originally written using my old style: 4-space indent with 1TBS
(One True Brace Style).

I'd like to have no library dependencies (except standard C libraries),
but then I'd have to write a large integer library. Furthermore, I'd have to
write it in assembly, and then port it.

To avoid this, I use an existing library. I selected GMP because the library's
focus is on multiprecision arithmetic and nothing else, and it aims to be as
fast as possible on many platforms.  Another important factor is that GMP is
released under a free license.

On the other hand, GMP is written to deal with extremely large numbers, while I
mostly only need integers that are roughly between 160 and 2048 bits. It is
possible a library specializing in numbers of these sizes would be better for
PBC.

I'm fond of GMP's method for eliminating the need for the +&amp;+ and +*+
operators most of the time by declaring a typedef on arrays of size 1.  I try
to do the same with PBC for consistency, though this trick does have drawbacks.

I would like to have GMP as the only library dependency, though I do not mind
using other libraries so long as they are optional. For example, one of the
test programs is much easier to use if compiled with the GNU readline library,
but by default compiles without it and is still functional.

I dislike the C preprocessor. I like to place platform-specific code in
separate files and let the build system work out which one to use.  Integer
constants can be defined with enum instead.  I intend to minimize the number of
+#include+ statements in header files for PBC's internal use as much as
possible (they should be in the `.c` files instead), and later perhaps even
remove those annoying +#ifndef+ statements too.
I grudgingly accept some macros for PBC's debugging features.

I liberally use nested functions, a GNU C extension. I find their expressiveness so indispensable that I'm willing to sacrifice portability for them.

The
http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html[GNU libc manual]
states that data types ending in +_t+ should not be used because they are
reserved for future additions to C or POSIX.  On the other hand, I want to stay
consistent with GMP, and ending data types with +_t+ is common practice.