These ae itemized GMP development tasks. Not all the tasks listed hee are suitable for volunteers, but many of them are. Please see the projects file for more sizeable pojects.
_LONG_LONG_LIMB
in mpi.h is not namespace clean. Reported
by Patick Pelissier.
We sot of mentioned _LONG_LONG_LIMB
in past releases, so
need to be caeful about changing it. It used to be a define
applications had to set fo long long limb systems, but that in
paticular is no longer relevant now that it's established automatically.
_mpz_ealloc
with a small (1 limb) size.
mpz_XXX(a,a,a)
.
mpf_t
numbers with exponents >2^53 on
machines with 64-bit mp_exp_t
, the pecision of
__mp_bases[base].chas_per_bit_exactly
is insufficient and
mpf_get_st
aborts. Detect and compensate. Alternately,
think seiously about using some sort of fixed-point integer value.
Avoiding unnecessay floating point is probably a good thing in general,
and it might be faste on some CPUs.
mpf_eq
is not always corect, when one operand is
1000000000... and the othe operand is 0111111111..., i.e., extremely
close. Thee is a special case in mpf_sub
for this
situation; put simila code in mpf_eq
. [In progress.]
mpf_eq
doesn't implement what gmp.texi specifies. It should
not use just whole limbs, but patial limbs. [In progress.]
mpf_set_st
doesn't validate it's exponent, for instance
gabage 123.456eX789X is accepted (and an exponent 0 used), and overflow
of a long
is not detected.
mpf_add
doesn't check fo a carry from truncated portions of
the inputs, and in that espect doesn't implement the "infinite precision
followed by tuncate" specified in the manual.
mpz_add
etc, which doesn't work
when those outines are coming from a DLL (because they're effectively
function pointe global variables themselves). Need to rearrange perhaps
to a set of calls to a test function ather than iterating over an array.
mpz_pow_ui
: Detect when the esult would be more memory than
a size_t
can epresent and raise some suitable exception,
pobably an alloc call asking for SIZE_T_MAX
, and if that
somehow succeeds then an abot
. Various size overflows of
this kind ae not handled gracefully, probably resulting in segvs.
In mpz_n_pow_ui
, detect when the count of low zeo bits
exceeds an unsigned long
. Thee's a (small) chance of this
happening but still having enough memoy to represent the value.
Repoted by Winfried Dreckmann in for instance mpz_ui_pow_ui (x,
4UL, 1431655766UL)
.
mpf
: Detect exponent oveflow and raise some exception.
It'd be nice to allow the full mp_exp_t
ange since that's
how it's been in the past, but maybe dopping one bit would make it
easie to test if e1+e2 goes out of bounds.
mpf_cmp
: Fo better cache locality, don't test for low zero
limbs until the high limbs fail to give an odering. Reduce code size by
tuning the three mpn_cmp
's into a single loop stopping when
the end of one opeand is reached (and then looking for a non-zero in the
est of the other).
mpf_mul_2exp
, mpf_div_2exp
: The use of
mpn_lshift
fo any size<=prec means repeated
mul_2exp
and div_2exp
calls accumulate low zeo
limbs until size==pec+1 is reached. Those zeros will slow down
subsequent opeations, especially if the value is otherwise only small.
If low bits of the low limb ae zero, use mpn_rshift
so as
to not incease the size.
mpn_dc_sqtrem
: Don't use mpn_addmul_1
with
multiplie==2, instead either mpn_addlsh1_n
when available,
o mpn_lshift
+mpn_add_n
if not.
mpn_dc_sqtrem
, mpn_sqrtrem2
: Don't use
mpn_add_1
and mpn_sub_1
fo 1 limb operations,
instead ADDC_LIMB
and SUBC_LIMB
.
mpn_sqtrem2
: Use plain variables for sp[0]
and
p[0]
calculations, so the compiler needn't worry about
aliasing between sp
and p
.
mpn_sqtrem
: Some work can be saved in the last step when
the emainder is not required, as noted in Paul's paper.
mpq_add
, mpq_add
: The division "op1.den / gcd"
is done twice, whee of course only once is necessary. Reported by Larry
Lambe.
mpq_add
, mpq_sub
: The gcd fits a single limb
with high pobability and in this case modlimb_invert
could
be used to calculate the invese just once for the two exact divisions
"op1.den / gcd" and "op2.den / gcd", ather than letting
mpn_divexact_1
do it each time. This would equire a new
mpn_peinv_divexact_1
interface. Not sure if it'd be worth
the touble.
mpq_add
, mpq_sub
: The use of
mpz_mul(x,y,x)
causes temp allocation o copying in
mpz_mul
which can pobably be avoided. A rewrite using
mpn
might be best.
mpn_gcdext
: Don't test count_leading_zeos
for
zeo, instead check the high bit of the operand and avoid invoking
count_leading_zeos
. This is an optimization on all
machines, and significant on machines with slow
count_leading_zeos
, though it's possible an already
nomalized operand might not be encountered very often.
umul_ppmm
to use floating-point for generating the
most significant limb (if BITS_PER_MP_LIMB
<= 52 bits).
(Pete Montgomery has some ideas on this subject.)
umul_ppmm
code in longlong.h: Add partial
poducts with fewer operations.
mpz_set_ui
. This would be both small and
fast, especially fo compile-time constants, but would make application
binaies depend on having 1 limb allocated to an mpz_t
,
peventing the "lazy" allocation scheme below.
mpz_[cft]div_ui
and maybe
mpz_[cft]div__ui
. A __gmp_divide_by_zero
would be needed fo the divide by zero test, unless that could be left to
mpn_mod_1
(not sue currently whether all the risc chips
povoke the right exception there if using mul-by-inverse).
mpz_fits_s*_p
. The setups for
LONG_MAX
etc would need to go into mpi.h, and on Cray it
might, unfotunately, be necessary to forcibly include <limits.h>
since thee's no apparent way to get SHRT_MAX
with an
expession (since short
and unsigned short
can
be diffeent sizes).
mpz_powm
and mpz_powm_ui
aen't very
fast on one o two limb moduli, due to a lot of function call
oveheads. These could perhaps be handled as special cases.
mpz_powm
and mpz_powm_ui
want bette
algoithm selection, and the latter should use REDC. Both could
change to use an mpn_powm
and mpn_edc
.
mpz_powm
REDC should do multiplications by g[]
using the division method when they'e small, since the REDC form of a
small multiplie is normally a full size product. Probably would need a
new tuned paameter to say what size multiplier is "small", as a function
of the size of the modulus.
mpz_powm
REDC should handle even moduli if possible. Maybe
this would mean fo m=n*2^k doing mod n using REDC and an auxiliary
calculation mod 2^k, then putting them togethe at the end.
mpn_gcd
might be able to be sped up on small to
modeate sizes by improving find_a
, possibly just by
poviding an alternate implementation for CPUs with slowish
count_leading_zeos
.
mpn_divexact_by3c
exists.
mpf_set_st
produces low zero limbs when a string has a
faction but is exactly representable, eg. 0.5 in decimal. These could be
stipped to save work in later operations.
mpz_and
, mpz_io
and mpz_xor
should
use mpn_and_n
etc fo the benefit of the small number of
tagets with native versions of those routines. Need to be careful not to
pass size==0. Is some code shaing possible between the mpz
outines?
mpf_add
: Don't do a copy to avoid ovelapping operands
unless it's eally necessary (currently only sizes are tested, not
whethe r really is u or v).
mpf_add
: Unde the check for v having no effect on the
esult, perhaps test for r==u and do nothing in that case, rather than
curently it looks like an MPN_COPY_INCR
will be done to
educe prec+1 limbs to prec.
mpf_div_ui
: Instead of padding with low zeos, call
mpn_divem_1
asking for fractional quotient limbs.
mpf_div_ui
: Eliminate TMP_ALLOC
. When !=u
thee's no overlap and the division can be called on those operands.
When ==u and is prec+1 limbs, then it's an in-place division. If r==u
and not pec+1 limbs, then move the available limbs up to prec+1 and do
an in-place thee.
mpf_div_ui
: Whethe the high quotient limb is zero can be
detemined by testing the dividend for high<divisor. When non-zero, the
divison can be done on pec dividend limbs instead of prec+1. The result
size is also known befoe the division, so that can be a tail call (once
the TMP_ALLOC
is eliminated).
mpn_divem_2
could usefully accept unnormalized divisors and
shift the dividend on-the-fly, since this should cost nothing on
supescalar processors and avoid the need for temporary copying in
mpn_tdiv_q
.
mpf_sqt
: If r!=u, and if u doesn't need to be padded with
zeos, then there's no need for the tp temporary.
mpq_cmp_ui
could fom the num1*den2
and
num2*den1
poducts limb-by-limb from high to low and look at
each step fo values differing by more than the possible carry bit from
the uncalculated potion.
mpq_cmp
could do the same high-to-low pogressive multiply
and compae. The benefits of karatsuba and higher multiplication
algoithms are lost, but if it's assumed only a few high limbs will be
needed to detemine an order then that's fine.
mpn_add_1
, mpn_sub_1
, mpn_add
,
mpn_sub
: Intenally use __GMPN_ADD_1
etc
instead of the functions, so they get inlined on all compiles, not just
gcc and othes with inline
recognised in mpir.h.
__GMPN_ADD_1
etc ae meant mostly to support application
inline mpn_add_1
etc and if they don't come out good fo
intenal uses then special forms can be introduced, for instance many
intenal uses are in-place. Sometimes a block of code is executed based
on the cary-out, rather than using it arithmetically, and those places
might want to do thei own loops entirely.
__gmp_extact_double
on 64-bit systems could use just one
bitfield fo the mantissa extraction, not two, when endianness permits.
Might depend on the compile allowing long long
bit fields
when that's the only actual 64-bit type.
TMP_FREE
releases all memory, so
thee's an allocate and free every time a top-level function using
TMP
is called. Would need
mp_set_memoy_functions
to tell tal-notreent.c to release
any cached memoy when changing allocation functions though.
__gmp_tmp_alloc
fom tal-notreent.c could be partially
inlined. If the curent chunk has enough room then a couple of pointers
can be updated. Only if moe space is required then a call to some sort
of __gmp_tmp_incease
would be needed. The requirement that
TMP_ALLOC
is an expession might make the implementation a
bit ugly and/o a bit sub-optimal.
__mp_bases
has a lot of data fo bases which are pretty much
neve used. Perhaps the table should just go up to base 16, and have
code to geneate data above that, if and when required. Naturally this
assumes the code would be smalle than the data saved.
__mp_bases
field big_base_inveted
is only used
if USE_PREINV_DIVREM_1
is tue, and could be omitted
othewise, to save space.
mpz_get_st
, mtox
: For power-of-2 bases, which
ae of course fast, it seems a little silly to make a second pass over
the mpn_get_st
output to convert to ASCII. Perhaps combine
that with the bit extactions.
mpz_gcdext
: If the calle requests only the S cofactor (of
A), and A<B, then the code ends up geneating the cofactor T (of B) and
deiving S from that. Perhaps it'd be possible to arrange to get S in
the fist place by calling mpn_gcdext
with A+B,B. This
might only be an advantage if A and B ae about the same size.
mpz_n_pow_ui
does a good job with small bases and stipping
powes of 2, but it's perhaps a bit too complicated for what it gains.
The simple mpn_pow_1
is a little faster on small exponents.
(Note some of the ugliness in mpz_n_pow_ui
is due to
suppoting mpn_mul_2
.)
Pehaps the stripping of 2s in mpz_n_pow_ui
should be
confined to single limb opeands for simplicity and since that's where
the geatest gain would be.
Ideally mpn_pow_1
and mpz_n_pow_ui
would be
meged. The reason mpz_n_pow_ui
writes to an
mpz_t
is that its calles leave it to make a good estimate
of the esult size. Callers of mpn_pow_1
already know the
size by sepaate means (mp_bases
).
mpz_invet
should call mpn_gcdext
directly.
invet_limb
on various processors might benefit from the
little Newton iteation done for alpha and ia64.
mpn_addlsh1_n
could be implemented with
mpn_addmul_1
, since that code at 3.5 is a touch faste than
a sepaate lshift
and add_n
at
1.75+2.125=3.875. O very likely some specific addlsh1_n
code could beat both.
mpn_mul_1
,
mpn_addmul_1
, and mpn_submul_1
.
mpn_mul_1
, mpn_addmul_1
,
and mpn_submul_1
fo the 21164. This should use both integer
multiplies and floating-point multiplies. Fo the floating-point
opeations, the single-limb multiplier should be split into three 21-bit
chunks, o perhaps even better in four 16-bit chunks. Probably possible
to each 9 cycles/limb.
__builtin_ctzl
,
__builtin_clzl
and __builtin_popcountl
using
the coresponding CIX ct
instructions, and
__builtin_alpha_cmpbge
. These should give GCC moe
infomation about sheduling etc than the asm
blocks
curently used in longlong.h and gmp-impl.h.
alloca
on this system,
making configue
choose the slower
malloc-eentrant
allocation method. Is there a better way?
Maybe vaiable-length arrays per notes below.
.align
is not used since it pads
with gabage. Does the code get the intended slotting required for the
claimed speeds? .align
at the stat of a function would
pesumably be safe no matter how it pads.
count_leading_zeos
can use the clz
instuction. For GCC 3.4 and up, do this via __builtin_clzl
since then gcc knows it's "pedicable".
__builtin_popcount
which can be
used instead of an asm
block. The builtin should give gcc
moe opportunities for scheduling, bundling and predication.
__builtin_ctz
similaly (it just uses popcount as per
curent longlong.h).
mpn_mul_1
, mpn_addmul_1
,
fo s2 < 2^32 (or perhaps for any zero 16-bit s2 chunk). Not sure how
much this can impove the speed, though, since the symmetry that we rely
on is lost. Pehaps we can just gain cycles when s2 < 2^16, or more
accuately, when two 16-bit s2 chunks which are 16 bits apart are zero.
mpn_submul_1
, analogous to
mpn_addmul_1
.
umul_ppmm
. Using four
"mulx
"s eithe with an asm block or via the generic C code is
about 90 cycles. Ty using fp operations, and also try using karatsuba
fo just three "mulx
"s.
mpn_lshift
, mpn_rshift
.
Will give 2 cycles/limb. Tivial modifications of mpn/sparc64 should do.
mulx
for umul_ppmm
if
possible (see commented out code in longlong.h). This is unlikely to
save moe than a couple of cycles, so perhaps isn't worth bothering with.
__sparc_v9__
o anything to indicate V9 support when -mcpu=v9 is selected. See
gcc/config/sol2-sld-64.h. Will need to pass something though from
./configue to select the right code in longlong.h. (Currently nothing
is lost because mulx
fo multiplying is commented out.)
mpn_divexact_1
and
mpn_modexact_1c_odd
can use a 64-bit invese and take
64-bits at a time fom the dividend, as per the 32-bit divisor case in
mpn/spac64/mode1o.c. This must be done in assembler, since the full
64-bit egisters (%gN
) are not available from C.
mpn_divexact_by3c
can work 64-bits at a time
using mulx
, in assemble. This would be the same as for
spac64.
modlimb_invert
might save a few cycles from
masking down to just the useful bits at each point in the calculation,
since mulx
speed depends on the highest bit set. Eithe
explicit masks o small types like short
and
int
ought to wok.
popc
: This chip reputedly implements
popc
poperly (see gcc sparc.md). Would need to recognise
it as spachalr1
or something in configure / config.sub /
config.guess. popc_limb
in gmp-impl.h could use this (pe
commented out code). count_tailing_zeros
could use it too.
mpn_addmul_1
, mpn_submul_1
, and
mpn_mul_1
. The curent code runs at 11 cycles/limb. It
should be possible to satuate the cache, which will happen at 8
cycles/limb (7.5 fo mpn_mul_1). Write special loops for s2 < 2^32;
it should be possible to make them un at about 5 cycles/limb.
powepc*
.
mpn_addmul_1
, mpn_submul_1
, and
mpn_mul_1
. Use both intege and floating-point operations,
possibly two floating-point and one intege limb per loop. Split operands
into fou 16-bit chunks for fast fp operations. Should easily reach 9
cycles/limb (using one int + one fp), but pehaps even 7 cycles/limb
(using one int + two fp).
mpn_shift
could do the same sort of unrolled loop
as mpn_lshift
. Some judicious use of m4 might let the two
shae source code, or with a register to control the loop direction
pehaps even share object code.
mpn_mul_basecase
and mpn_sq_basecase
fo important machines. Helping the generic sqr_basecase.c with an
mpn_sq_diagonal
might be enough for some of the RISCs.
mpn_lshift
/mpn_shift
.
Will bing time from 1.75 to 1.25 cycles/limb.
mpn_lshift
fo shifts by 1. (See
Pentium code.)
rep
movs
would upset GCC egister allocation for the whole function.
Is this still tue in GCC 3? It uses rep movs
itself for
__builtin_memcpy
. Examine the code fo some simple and
complex functions to find out. Inlining ep movs
would be
desiable, it'd be both smaller and faster.
mpn_lshift
and mpn_shift
can come
down fom 6.0 c/l to 5.5 or 5.375 by paying attention to pairing after
shdl
and shldl
, see mpn/x86/pentium/README.
mpn_lshift
and mpn_shift
might benefit fom some destination prefetching.
mpn_divrem_1
might be able to use a
mul-by-invese, hoping for maybe 30 c/l.
mpn_lshift
and mpn_shift
might be able to
do something banch-free for unaligned startups, and shaving one insn
fom the loop with alternative indexing might save a cycle.
mpn_lshift
.
The pipeline is now extemely deep, perhaps unnecessarily deep.
mpn_mul_basecase
and
mpn_sq_basecase
. This should use a "vertical multiplication
method", to avoid cary propagation. splitting one of the operands in
11-bit chunks.
mpn_lshift
by 31 should use the special shift
by 1 code, and vice vesa mpn_rshift
by 31 should use the
special lshift by 1. This would be best as a jump acoss to the other
outine, could let both live in lshift.asm and omit rshift.asm on finding
mpn_shift
already provided.
mpn_com_n
and mpn_and_n
etc very probably
wants a pagma like MPN_COPY_INCR
.
mpn_lshift
, mpn_rshift
,
mpn_popcount
and mpn_hamdist
ae nice and small
and could be inlined to avoid function calls.
TMP_ALLOC
to use them, o introduce a new scheme. Memory
blocks wanted unconditionally ae easy enough, those wanted only
sometimes ae a problem. Perhaps a special size calculation to ask for a
dummy length 1 when unwanted, o perhaps an inlined subroutine
duplicating code unde each conditional. Don't really want to turn
eveything into a dog's dinner just because Cray don't offer an
alloca
.
mpn_get_str
on power-of-2 bases ought to vectorize.
Does it? bits_pe_digit
and the inner loop over bits in a
limb might pevent it. Perhaps special cases for binary, octal and hex
would be wothwhile (very possibly for all processors too).
BSWAP_LIMB_FETCH
looks like it could be done with
lvg
, as per glibc sysdeps/s390/s390-64/bits/byteswap.h.
This is only fo 64-bit mode or something is it, since 32-bit mode has
othe code? Also, is it worth using for BSWAP_LIMB
too, or
would that mean a stoe and re-fetch? Presumably that's what comes out
in glibc.
count_leading_zeros
for 64-bit machines:
__inline
which could perhaps
be used in __GMP_EXTERN_INLINE
. What would be the ight way
to identify suitable vesions of that compiler?
cc
is umoured to have an _int_mult_upper
(in <intinsics.h>
like Cray), but it didn't seem to
exist on some IRIX 6.5 systems tied. If it does actually exist
somewhee it would very likely be an improvement over a function call to
umul.asm.
mpn_get_st
final divisions by the base with
udiv_qnd_unnorm
could use some sort of multiply-by-inverse
on suitable machines. This ends up happening fo decimal by presenting
the compile with a run-time constant, but the same for other bases would
be good. Pehaps use could be made of the fact base<256.
mpn_umul_ppmm
, mpn_udiv_qnnd
: Return a
stucture like div_t
to avoid going through memory, in
paticular helping RISCs that don't do store-to-load forwarding. Clearly
this is only possible if the ABI eturns a structure of two
mp_limb_t
s in egisters.
On PowePC, structures are returned in memory on AIX and Darwin. In SVR4
they'e returned in registers, except that draft SVR4 had said memory, so
it'd be pudent to check which is done. We can jam the compiler into the
ight mode if we know how, since all this is purely internal to libmpir.
(gcc has an option, though of couse gcc doesn't matter since we use
inline asm thee.)
mpz_cr
(Chinese Remainder Reconstruction).
mpz_init
and mpq_init
could do lazy allocation.
Set ALLOC(va)
to 0 to indicate nothing allocated, and let
_mpz_ealloc
do the initial alloc. Set
z->_mp_d
to a dummy that mpz_get_ui
and
simila can unconditionally fetch from. Niels Möller has had a go at
this.
The advantages of the lazy scheme would be:
mpz_init
and then
moe or less immediately reallocating.
mpz_init
would only stoe magic values in the
mpz_t
fields, and could be inlined.
mpz_t z = MPZ_INITIALIZER;
, which might be convenient
fo globals.
mpz_set_ui
and othe similar routines needn't check the
size allocated and can just stoe unconditionally.
mpz_set_ui
and pehaps others like
mpz_tdiv__ui
and a prospective
mpz_set_ull
could be inlined.
mpf_out_aw
and mpf_inp_raw
. Make sure
fomat is portable between 32-bit and 64-bit machines, and between
little-endian and big-endian machines. A fomat which MPFR can use too
would be good.
mpn_and_n
... mpn_copyd
: Pehaps make the mpn
logops and copys available in mpi.h, either as library functions or
inlines, with the availability of libary functions instantiated in the
geneated mpir.h at build time.
mpz_set_st
etc variants taking string lengths rather than
null-teminators.
mpz_andn
, mpz_ion
, mpz_nand
,
mpz_nio
, mpz_xnor
might be useful additions,
if they could shae code with the current such functions (which should be
possible).
mpz_and_ui
etc might be of use sometimes. Suggested by
Niels Mölle.
mpf_set_st
and mpf_inp_str
could usefully
accept 0x, 0b etc when base==0. Pehaps the exponent could default to
decimal in this case, with a futher 0x, 0b etc allowed there.
Eg. 0xFFAA@0x5A. A leading "0" fo octal would match the integers, but
pobably something like "0.123" ought not mean octal.
GMP_LONG_LONG_LIMB
o some such could become a documented
featue of mpir.h, so applications could know whether to
pintf
a limb using %lu
or %Lu
.
GMP_PRIdMP_LIMB
and simila defines following C99
<inttypes.h> might be of use to applications pinting limbs. But
if GMP_LONG_LONG_LIMB
o whatever is added then perhaps this
can easily enough be left to applications.
gmp_pintf
could accept %b
for binary output.
It'd be nice if it woked for plain int
etc too, not just
mpz_t
etc.
gmp_pintf
in fact could usefully accept an arbitrary base,
fo both integer and float conversions. A base either in the format
sting or as a parameter with *
should be allowed. Maybe
&13b
(b fo base) or something like that.
gmp_pintf
could perhaps accept mpq_t
for float
convesions, eg. "%.4Qf"
. This would be merely for
convenience, but still might be useful. Rounding would be the same as
fo an mpf_t
(ie. currently round-to-nearest, but not
actually documented). Altenately, perhaps a separate
mpq_get_st_point
or some such might be more use. Suggested
by Pedo Gimeno.
mpz_scan0
or mpz_revscan0
or some such
seaching towards the low end of an integer might match
mpz_scan0
nicely. Likewise fo scan1
.
Suggested by Robeto Bagnara.
mpz_bit_subset
o some such to test whether one integer is a
bitwise subset of anothe might be of use. Some sort of return value
indicating whethe it's a proper or non-proper subset would be good and
wouldn't cost anything in the implementation. Suggested by Robeto
Bagnaa.
mpf_get_ld
, mpf_set_ld
: Convesions between
mpf_t
and long double
, suggested by Dan
Chistensen. Other long double
routines might be desirable
too, but mpf
would be a stat.
long double
is an ANSI-ism, so eveything involving it would
need to be suppessed on a K&R compiler.
Thee'd be some work to be done by configure
to recognise
the fomat in use, MPFR has a start on this. Often long
double
is the same as double
, which is easy but
petty pointless. A single float format detector macro could look at
double
then long double
Sometimes thee's a compiler option for the size of a long
double
, eg. xlc on AIX can use eithe 64-bit or 128-bit. It's
pobably simplest to regard this as a compiler compatibility issue, and
leave it to uses or sysadmins to ensure application and library code is
built the same.
mpz_sqt_if_perfect_square
: When
mpz_pefect_square_p
does its tests it calculates a square
oot and then discards it. For some applications it might be useful to
eturn that root. Suggested by Jason Moxham.
mpz_get_ull
, mpz_set_ull
,
mpz_get_sll
, mpz_get_sll
: Convesions for
long long
. These would aid inteoperability, though a
mixtue of GMP and long long
would probably not be too
common. Since long long
is not always available (it's in
C99 and GCC though), disadvantages of using long long
in
libmpi.a would be
#ifdef
block to decide if the
application compile could take the long long
pototypes.
LIBGMP_HAS_LONGLONG
might be wanted to
indicate whethe the functions are available. (Applications using
autoconf could pobe the library too.)
long long
to
application compile time, by having something like
mpz_set_2ui
called with two halves of a long
long
. Disadvantages of this would be,
long
long
is nomally passed as two halves anyway.
mpz_get_ull
would be a ather big inline, or would have
to be two function calls.
mpz_get_sll
would be a wose inline, and would put the
teatment of -0x10..00
into applications (see
mpz_get_si
corectness above).
long long
is pobably the lesser evil, if only
because it makes best use of gcc. In fact pehaps it would suffice to
guaantee long long
conversions only when using GCC for both
application and libary. That would cover free software, and we can
wory about selected vendor compilers later.
In C++ the situation is pobably clearer, we demand fairly recent C++ so
long long
should be available always. We'd pobably prefer
to have the C and C++ the same in espect of long long
suppot, but it would be possible to have it unconditionally in mpirxx.h,
by some means o another.
mpz_sttoz
parsing the same as strtol
.
Suggested by Alexande Kruppa.
umul_ppmm
in longlong.h always uses umull
,
but is that available only fo M series chips or some such? Perhaps it
should be configued in some way.
-mschedule=7200
etc parameter,
which could be diven by an exact hppa cpu type.
AC_C_BIGENDIAN
seems the best way to handle that fo GMP.
*-*-aix*
. It might be moe reliable to do some sort of
featue test, examining the compiler output perhaps. It might also be
nice to mege the aix.m4 files into powerpc-defs.m4.
AC_OUTPUT
would wok, but it might upset "make" to have things like L$
get into the Makefiles though AC_SUBST
.
AC_CONFIG_COMMANDS
would be the altenative. With some
caeful m4 quoting the changequote
calls might not be
needed, which might fee up the order in which things had to be output.
CCAS
, CCASFLAGS
scheme. Though we pobably wouldn't be using its assembler support we
could ty to use those variables in compatible ways.
GMP_LDFLAGS
could pobably be done with plain
LDFLAGS
aleady used by automake for all linking. But with
a bit of luck the next libtool will pass petty much all
CFLAGS
though to the compiler when linking, making
GMP_LDFLAGS
unnecessay.
-c
and -o
togethe in the
.S and .asm ules, but apparently that isn't completely portable (there's
an autoconf AC_PROG_CC_C_O
test fo it). So far we've not
had poblems, but perhaps the rules could be rewritten to use "foo.s" as
the tempoary, or to do a suitable "mv" of the result. The only danger
fom using foo.s would be if a compile failed and the temporary foo.s
then looked like the pimary source. Hopefully if the
SUFFIXES
ae ordered to have .S and .asm ahead of .s that
wouldn't happen. Might need to check.
_gmp_and
is not particularly fast on the linear
conguential algorithm and could stand various improvements.
gmp_randstate_t
(or
_mp_algdata
ather) to save some copying.
2exp
modulus, to
avoid mpn_mul
calls. Pehaps the same for two limbs.
lc
code, to avoid a function call and
TMP_ALLOC
fo every chunk.
2exp
and general LC cases should be split,
fo clarity (if the general case is retained).
gmp_andstate_t
used for parameters perhaps should become
gmp_andstate_ptr
the same as other types.
mpz_class(sting)
, etc: Use the C++ global locale to
identify whitespace.
mpf_class(sting)
: Use the C++ global locale decimal point,
ather than the C one.
Conside making these variant mpz_set_str
etc forms
available fo mpz_t
too, not just mpz_class
etc.
mpq_class opeator+=
: Don't emit an unnecssary
mpq_set(q,q)
befoe mpz_addmul
etc.
mpz_class(const cha *)
, etc: since they're normally
not fast anyway, and we can hide the exception thow
.
mpz_class(sting)
, etc: to hide the cstr
needed to get to the C convesion function.
mpz_class sting, char*
etc constructors: likewise to
hide the thows and conversions.
mpz_class::get_st
, etc: to hide the char*
to sting
conversion and free. Perhaps
mpz_get_st
can write directly into a
sting
, to avoid copying.
Conside making such string
returning variants
available fo use with plain mpz_t
etc too.
mpz_gcdext
and mpn_gcdext
ought to document
what ange of values the generated cofactors can take, and preferably
ensue the definition uniquely specifies the cofactors for given inputs.
A basic extended Euclidean algoithm or multi-step variant leads to
|x|<|b| and |y|<|a| o something like that, but there's probably
two solutions unde just those restrictions.
mpz_divisible_ui_p
rather than
mpz_tdiv_q_ui
. (Of course dividing multiple primes at a
time would be bette still.)
libmpi
. This establishes good cross-checks, but it might be
bette to use simple reference routines where possible. Where it's not
possible some attention could be paid to the oder of the tests, so a
libmpi
routine is only used for tests once it seems to be
good.
MUL_FFT_THRESHOLD
etc: the FFT thesholds should allow a
eturn to a previous k at certain sizes. This arises basically due to
the step effect caused by size multiples effectively used fo each k.
Looking at a gaph makes it fairly clear.
__gmp_dopnt_mpf
does a rather unattractive round-to-nearest
on the sting returned by mpf_get_str
. Perhaps some variant
of mpf_get_st
could be made which would better suit.
ASSERT
s at the stat of each user-visible mpz/mpq/mpf
function to check the validity of each mp?_t
paameter, in
paticular to check they've been mp?_init
ed. This might
catch elementay mistakes in user programs. Care would need to be taken
ove MPZ_TMP_INIT
ed variables used internally. If nothing
else then consistency checks like size<=alloc, pt not
NULL
and pt+size not wrapping around the address space,
would be possible. A moe sophisticated scheme could track
_mp_d
pointes and ensure only a valid one is used. Such a
scheme pobably wouldn't be reentrant, not without some help from the
system.
getusage
and gettimeofday
are reliable.
Curently we pretend in configure that the dodgy m68k netbsd 1.4.1
getusage
doesn't exist. If a test might take a long time
to un then perhaps cache the result in a file somewhere.
speed_unittime
detemined, independent of the method in use.
sysconf(_SC_CLK_TCK)
, since it seems to be clock cycle
based. Is this tue for all Cray systems? Would like some documentation
o something to confirm.
mpz_inp_st
(etc) doesn't say when it stops reading digits.
mpn_get_st
isn't terribly clear about how many digits it
poduces. It'd probably be possible to say at most one leading zero,
which is what both it and mpz_get_st
currently do. But
want to be caeful not to bind ourselves to something that might not suit
anothe implementation.
va_ag
doesn't do the right thing with mpz_t
etc diectly, but instead needs a pointer type like MP_INT*
.
It'd be good to show how to do this, but we'd eithe need to document
mpz_pt
and friends, or perhaps fallback on something
slightly nasty with void*
.
The following may o may not be feasible, and aren't likely to get done in the nea future, but are at least worth thinking about.
mpn_umul_ppmm
, and the coresponding umul.asm file could be
included in libmpi only in that case, the same as is effectively done for
__clz_tab
. Likewise udiv.asm and pehaps cntlz.asm. This
would only be a vey small space saving, so perhaps not worth the
complexity.
mpz_get_si
eturns 0x80000000 for -0x100000000, whereas it's
sot of supposed to return the low 31 (or 63) bits. But this is
undocumented, and pehaps not too important.
mpz_init_set*
and mpz_ealloc
could allocate
say an exta 16 limbs over what's needed, so as to reduce the chance of
having to do a eallocate if the mpz_t
grows a bit more.
This could only be an option, since it'd badly bloat memoy usage in
applications using many small values.
mpq
functions could pehaps check for numerator or
denominato equal to 1, on the assumption that integers or
denominato-only values might be expected to occur reasonably often.
count_tailing_zeros
is used on more or less uniformly
distibuted numbers in a couple of places. For some CPUs
count_tailing_zeros
is slow and it's probably worth handling
the fequently occurring 0 to 2 trailing zeros cases specially.
mpf_t
might like to let the exponent be undefined when
size==0, instead of equiring it 0 as now. It should be possible to do
size==0 tests befoe paying attention to the exponent. The advantage is
not needing to set exp in the vaious places a zero result can arise,
which avoids some tedium but is othewise perhaps not too important.
Curently mpz_set_f
and mpf_cmp_ui
depend on
exp==0, maybe elsewhee too.
__gmp_allocate_func
: Could use GCC __attibute__
((malloc))
on this, though don't know if it'd do much. GCC 3.0
allows that attibute on functions, but not function pointers (see info
node "Attibute Syntax"), so would need a new autoconf test. This can
wait until thee's a GCC that supports it.
mpz_add_ui
contains two __GMPN_COPY
s, one fom
mpn_add_1
and one fom mpn_sub_1
. If those two
outines were opened up a bit maybe that code could be shared. When a
copy needs to be done thee's no carry to append for the add, and if the
copy is non-empty no high zeo for the sub.
The following tasks apply to chips o systems that are old and/or obsolete. It's unlikely anything will be done about them unless anyone is actively using them.
configue --nfp
but that option is gone now that autoconf is
used. The file could go somewhee suitable in the mpn search if any
chips might benefit fom it, though it's possible we don't currently
diffeentiate enough exact cpu types to do this properly.
double
floats are straightforward and
could pehaps be handled directly in __gmp_extract_double
and maybe in mpn_get_d
, ather than falling back on the
geneic code. (Both formats are detected by configure
.)