|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
Toom symmetriesHi,
I've been looking into switching points for interpolate_7pts, from (0, oo, 1, -1, 2, 1/2, -1/2) to (0, oo, +1, -1, 2, -2, 1/2). Evaluation and interpolation is equivalent for most practical purposes, so the reason I tried this was very mundane: Then I can reuse the same helper function for evaluation in 2 and -2 for both toom44 and toom43 (the latter uses interpolation with the points 0, oo, 1, -1, 2, -2). This got me to look a little closer to the symmetries of the choosen points. The most symmetry we can have is if the set of points is invariant under both negation and inversion. Then the point set is a union of sets of the types {0, oo} Each point invariant under negation, and they are swapped by inversion. {1, -1} Each point invariant under inversion, and they are swapped by negation. {a, -a, 1/a, -1/a} Swapped pairwise by the two operations. For a point pair (a, -a), a reasonable interpolation strategy (which seems to be part of at least some of the optimal sequences) is to split in even and odd parts, by a butterfly-like operation (w_i, w_j) <- ( (w_i + w_j)/2, (w_i - w_j)/(2a) ) Then one question is how to best do this, with current operations that would be either one addsub followed by two shifts, or one rshadd and and rshsub. A combined operation might be useful. Currently, I don't think any interpolation code uses addsub. Furthermore, if we also have symmetry under inversion for all points, then the resulting matrices for even and odd parts are the same! The simplest example, 4 points, w0 = f(0) w1 = f(1) w2 = f(-1) w3 = f(oo) corresponds to the matrix 1 0 0 0 1 1 1 1 1 -1 1 -1 0 0 0 1 After addsub/2, we have 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0 1 We get (1, 0; 1,1) for the even coefficients and (1,1; 0,1) for the even coefficients, which only differ by permutation. For the next example, say we use eight points, 0, oo, 1, -1, 2, -2, 1/2, -1/2. The full matrix is 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 2 4 8 16 32 64 128 1 -2 4 -8 16 -32 64 -128 128 64 32 16 8 4 2 1 128 -64 32 -16 8 -4 2 -1 0 0 0 0 0 0 0 1 After butterfly operations on the ± pairs, we get 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 4 0 16 0 64 1 0 4 0 16 0 64 0 0 64 0 16 0 4 0 1 64 0 16 0 4 0 1 0 0 0 0 0 0 0 0 1 We get an even part 1 0 0 0 1 1 1 1 1 4 16 64 64 16 4 1 and an odd part 1 1 1 1 1 4 16 64 64 16 4 1 0 0 0 1 which again are equivalent. We can use the same row operations for both matrices, which means that 8 point interpolation can be implemented compactly using butterfly operations and reductions of the above 4x4 matrix as building blocks. Then the difficult question is choice of points. Next example with full symmetry like above would by 12 points, where we might dd the four points corresponding to a = 4, or a = 2^(GMP_NUMB_BITS) / 2. Also for 6 points, we will get more symmetry (and hence compactness of interpolation) if we use 1/2, -1/2 rather than +1 and -1. This is likely slower than the current strategy. Anyway, after butterflies, we get the same matrix 1 0 0 1 4 16 16 4 1 for odd and even parts, which can be reduced by w1 = w1 - w0 w2 = w2 - 16*w0 w1 = (w1 - w2)/15 w2 = (w2 - w1)/4 (I think this sequence is close to optimal). The butterfly operations (with no special addsub primitive) would be w2 = (w1 - w2)/2 w1 = w1 - w2 w4 = (w3 - w4)/2 w3 = (w3 - w4)/2 In total 12 subtractions, 2 left shifts, 5 right shifts and 2 divisions. This actually not so bad compared to the current code, there's just one more left shift (and then evaluation obviously also needs more shifts). Using the four points corresponding to a = sqrt(B) = 2^{GMP_NUMB_BITS /2} might also be worth investigating, in this case the same reduction procedure will involved only division by (B-1). /Niels _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesOn Jun 9, 2009, at 4:11 PM, Niels Möller wrote: > The simplest example, 4 points, > > w0 = f(0) > w1 = f(1) > w2 = f(-1) > w3 = f(oo) > > corresponds to the matrix > > 1 0 0 0 > 1 1 1 1 > 1 -1 1 -1 > 0 0 0 1 > > After addsub/2, we have > > 1 0 0 0 > 0 1 0 1 > 1 0 1 0 > 0 0 0 1 Hmmm... sorry to interrupt your train of thought, this gives me an idea. Maybe you can delay some of the division by two until after part of the recomposition? i.e.: instead of addsub/2, just do addsub, then you have 1 0 0 0 0 2 0 2 2 0 2 0 0 0 0 1 i.e. if the output poly coefficients are c0, c1, c2, c3, you have c0 2c1 + 2c3 2c0 + 2c2 c3 Now start doing recomposition: |=== 2c1 + 2c3 ===| |=== 2c0 + 2c2 ===| Then divide by 2: |==== c1 + c3 ====| |==== c0 + c2 ====| Then continue as usual, i.e. subtract c3, c0 from right places, add them at beginning and end (and also use the hi(c0) - lo(c3) redundancy trick...), to get |====== c0 =======| |====== c1 =======| |====== c2 =======| |====== c3 =======| I haven't checked the operation count, but it seems plausible that something like this could save half a shift. > In total 12 subtractions, 2 left shifts, 5 right shifts and 2 > divisions. This actually not so bad compared to the current code, > there's just one more left shift (and then evaluation obviously also > needs more shifts). Using the four points corresponding to a = sqrt(B) > = 2^{GMP_NUMB_BITS /2} might also be worth investigating, in this case > the same reduction procedure will involved only division by (B-1). I did try the 2^(GMP_NUMB_BITS/2) idea at some stage --- one difficulty is that the pointwise multiplications get slightly bigger. david _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesCiao!
>> The simplest example, 4 points, Unfortunately the best examples of symmetry came from the "Toom-and-a-half", Toom-2.5, Toom-3.5 and so on (as I called them with Zanoni). This means unbalanced only methods. The balanced always have an odd number of points... >> 1 0 0 0 >> 1 1 1 1 >> 1 -1 1 -1 >> 0 0 0 1 >> >> After addsub/2, we have >> >> 1 0 0 0 >> 0 1 0 1 >> 1 0 1 0 >> 0 0 0 1 An addsub/2 can be useful here, currently the effect of (a,b) <- ( (a+b)/2, (a-b)/2 ) is given by the sequence: a <- (a+b)/2 b <- a-b which implies one shift only. In other sequences I wrote, you can find something like: a <- a+b b <- 2*b b <- a-b this should be replaced by addsub (avoiding the shift) > Hmmm... sorry to interrupt your train of thought, this gives me an > idea. Maybe you can delay some of the division by two until after > part of the recomposition? [...] > I haven't checked the operation count, but it seems plausible that > something like this could save half a shift. There are a lot of tricks like this, the first one in Karatsuba (due to?) saving half an addition. There is one in Toom-3, saving half a sum again, and it is implemented in the current[1] interpolate_5pts, another in Toom-3.5, saving two half additions, implemented in interpolate_6pts. But in this particular case, I can not see it, because with the currently implemented sequence we compute one shift only, not two... Anyway the general idea is good, mix the recomposition phase and the interpolation, to save some half-operations. I'll give a real example I developed some times ago (but I did not have the time to really implement/test it). Again, the trick works perfectly for any Toom-and-a-half, the example is Toom-4.5, i.e. interpolation_8pts. I start from the matrix Niels wrote: >> For the next example, say we use eight points, 0, oo, 1, -1, 2, -2, >> 1/2, -1/2. The full matrix is >> >> 1 0 0 0 0 0 0 0 >> 1 1 1 1 1 1 1 1 >> 1 -1 1 -1 1 -1 1 -1 >> 1 2 4 8 16 32 64 128 >> 1 -2 4 -8 16 -32 64 -128 >> 128 64 32 16 8 4 2 1 >> 128 -64 32 -16 8 -4 2 -1 >> 0 0 0 0 0 0 0 1 Niels proposed the butterfly operations, I suggest something slightly different, removing also the values on the first and last columns, and with slightly different shifts, the result I need is: 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 4 0 16 0 0 0 0 1 0 4 0 16 0 0 16 0 4 0 1 0 0 0 0 16 0 4 0 1 0 0 0 0 0 0 0 0 1 As you can see, we have a lot of [x,0;0,x] blocks, this does not only mean we have two matrices that are identical and can be interpolated with one single function... we can do something better! We can "recompose" couples of lines! We have: |==== r0 =====| ||==== r1 =====| ||==== r2 =====| ||==== r3 =====| ||==== r4 =====| ||==== r5 =====| ||==== r6 =====| |==== r7 =====| and we recompose some pairs to get |==== r0 =====| ||==== r1 and r2 =====| ||==== r3 and r4 =====| ||==== r5 and r6 =====| |==== r7 =====| then we can interpolate the internal matrix 1 1 1 1 4 16 16 4 1 only once, operating on slices that are 3/2 times longer than the original ones, thus saving one half on _every_ operation after this step: additions, shifts, divisions... everything. Moreover, interlacing also the evaluation steps, we can also save on memory footprint: here is the strategy. - split in slices of "n" limbs - evaluate in {0, oo} (2n limbs each) - for each "a" - evaluate in {a, -a} (2n limbs + carry, each) - butterfly and remove first/last column - partially recompose (only 3n limbs + carry, n limbs [and a carry] freed) - interpolate the inner matrix For example, Toom-4.5 should need for the interpolation phase: 2 values of 2n limbs and 3 values of 3n limbs, a total of 13n limbs; instead of 8*2n=16n limbs... It should not be hard to write a single generic function doing butterfly/remove/recompose for any {a} (if a is a power of 2). Unfortunately, I have to underline it again... this strategy only works for the Toom-and-a-half algorithms, I mean unbalanced only :-( Have a nice day :-D Marco [1] http://gmplib.org:8000/gmp/file/79e81acfb9b3/mpn/generic/toom_interpolate_5pts.c#l134 -- http://bodrato.it/toom-cook/ _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesbodrato@... writes:
Unfortunately, I have to underline it again... this strategy only works for the Toom-and-a-half algorithms, I mean unbalanced only :-( Well, nothing stops us from over-sampling a balanced case in order to invoke this new interpolation strategy... -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesOn Jun 10, 2009, at 3:00 AM, bodrato@... wrote: > An addsub/2 can be useful here, currently the effect of > (a,b) <- ( (a+b)/2, (a-b)/2 ) > is given by the sequence: > a <- (a+b)/2 > b <- a-b > which implies one shift only. Ah yes of course, this is already better than what I suggested. david _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesOn Jun 10, 2009, at 3:00 AM, bodrato@... wrote: > There are a lot of tricks like this, the first one in Karatsuba > (due to?) > saving half an addition. There is one in Toom-3, saving half a sum > again, > and it is implemented in the current[1] interpolate_5pts, another in > Toom-3.5, saving two half additions, implemented in interpolate_6pts. Marco, At some point I skimmed over your paper with Zanoni on the optimality of Toom-Cook matrices; I recall your automated search did not take into account these kind of tricks that merge the interpolation and recomposition stages. Is this correct? Have you tried this since writing the paper? It seems that it shouldn't be difficult in theory to extend the search method, although I have no idea if it's practical (the search spaces are very large...?). david _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesAlle 14:08, mercoledì 10 giugno 2009, David Harvey ha scritto:
> On Jun 10, 2009, at 3:00 AM, bodrato@... wrote: > > There are a lot of tricks like this, the first one in Karatsuba > > (due to?) > > saving half an addition. There is one in Toom-3, saving half a sum > > again, > > and it is implemented in the current[1] interpolate_5pts, another in > > Toom-3.5, saving two half additions, implemented in interpolate_6pts. > > Marco, > > At some point I skimmed over your paper with Zanoni on the optimality > of Toom-Cook matrices; I recall your automated search did not take > into account these kind of tricks that merge the interpolation and > recomposition stages. Is this correct? Hi, I'm Alberto (Zanoni). Yes, it is. The paper deals only with the interpolation stage. We didn't consider its merging with evaluation and/or recomposition there. > Have you tried this since > writing the paper? It seems that it shouldn't be difficult in theory > to extend the search method, although I have no idea if it's > practical (the search spaces are very large...?). Just to know, in our case the search space - for interpolation only - for Toom-3.5 was analyzed in about one week of running time (while Toom-3 took just some seconds). It seems difficult to use the same tool from the very beginning of interpolation phase with higher Toom. In our paper, for higher Toom some "manual" steps had to be done, in order to have a sufficient number of zero entries in the matrix, so that the analysis could be done in reasonable time. This limitation was in practice the starting point to look for other techniques searching for optimization or generalization somewhere else (stages merging, use of the "divide by 2" technique, the "general" Toom using powers of 2,...) We didn't try to insert these techniques in our "old" automated search tool. Alberto _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesCiao,
> Unfortunately, I have to underline it again... this strategy only works > for the Toom-and-a-half algorithms, I mean unbalanced only :-( > > Well, nothing stops us from over-sampling a balanced case in order to > invoke this new interpolation strategy... Maybe... As soon as the Fool-Toom-Tool implement it, we will know if this is a good idea :-D Marco -- http://bodrato.it/ _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
|
|
Re: Toom symmetriesbodrato@... writes:
> As you can see, we have a lot of [x,0;0,x] blocks, this does not only mean > we have two matrices that are identical and can be interpolated with one > single function... we can do something better! > We can "recompose" couples of lines! We have: That's clever. To see if I understand this, let's consider the simplest four-point interpolation first. The number we're trying to construct is x=f(B), with f(t) = x_0 + x_1 t + x_2 t^2 + x_3 t^3 and B is the power of two used for splitting. To start with, we are given w0 = f(0) = x0 w1 = f(1) = x0 + x1 + x2 + x3 w2 = f(-1) = x0 - x1 + x2 - x3 w3 = f(oo) = x3 First, a butterfly operation, replacing w1 and w2 by w1 = x1 + x3 w2 = x0 + x2 Then x2 = w2 - w0 and x1 = w1 - w3, so we can write x = w0 + (w1 - w3) B + (w2 - w0) B^2 + w3 B^3 = -(w3 B - w0) + (w1 + w2 B) B + (w3 B - w0) B^2 Let L w0 and H w0 be the low and high part of w0. Define y = w3 - H w0 z = w1 + B w2 Then we have x = Lw0 - y B + z B + y B^2 or graphically B^4 B^3 B^2 B 1 | | | | | | +---------+----+ | - y |Lw0 | +-----+---------+ | z | +---+---------------+ | y |-Lw0| +---------+----+ So the inputs to the recombination, at this stage, are y (2n limbs + sign), z (2n+1 limbs) and Lw0 (n limbs), for a total of 5n + 1 limbs. Compared to 8*n + 2 needed to represent the original w:s. We need scratch space for the recursive computations, and for the storage of z, but no more. (To compute z, put factors in the product area, first product in z, second in the product area, then do the butterflies and combination. w0 and w3 can be computed directly into the product area and combined into y. I don't fully understand the current toom32_mul, but maybe it's doing precisely this? The temporary allocation makes me suspicious, though: toom32_mul_itch returns 4*n + 2. toom32_mul does this, #define scratch_out scratch + 4 * n + 2 [...] TOOM22_MUL_N_REC (vm1, asm1, bsm1, n, scratch_out); The recursive call is to toom22_mul, which needs ~ 2n limbs of scratch space, according to toom22_mul_itch, and this area is beyond the scratch area that toom32_mul asks for. Regards, /Niels _______________________________________________ gmp-devel mailing list gmp-devel@... http://gmplib.org/mailman/listinfo/gmp-devel |
| Free embeddable forum powered by Nabble | Forum Help |