Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
If you have set things up so that the alignment is the same throughout the computation, a number of the 16 texels are outside of the discs and consequently always have coefficient 0, which means they can be dropped. (Maybe this requires reflections of the data to put it in "standard position"---like is done in Nohalo---and so on to make it work and consequently is not worth it.)
Last edited by NicolasRobidoux on 2014-06-08T01:01:59-07:00, edited 2 times in total.
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
One last "manic perfectionist" thing: Some of the positions, you know ahead of time that they are within 1, or farther than 1. So, you could use a special weight function for these special cases and skip some branches for these "indexes".
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
You write beautifully clear code.
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
It saved me like 20% of processing time! Indeed it works! Thanks.NicolasRobidoux wrote:Suggestion: Compute the weights using the formulas used in ImageMagick's resize.cand save one flop.Code: Select all
if (x < 1.0) return(resize_filter->coefficient[0]+x*(x*(resize_filter->coefficient[1]+x*resize_filter->coefficient[2]))); if (x < 2.0) return(resize_filter->coefficient[3]+x*(resize_filter->coefficient[4]+x*(resize_filter->coefficient[5]+x*resize_filter->coefficient[6]))); return(0.0);
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
Standard polynomial evaluation trick: Horner's rule.Hyllian wrote:It saved me like 20% of processing time! Indeed it works! Thanks.
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
What I mean is this:NicolasRobidoux wrote:One last "manic perfectionist" thing: Some of the positions, you know ahead of time that they are within 1, or farther than 1. So, you could use a special weight function for these special cases and skip some branches for these "indexes".
I assume that you fix things so that the sampling point is within the convex hull of the four central input pixel locations within the 4x4. (I could figure this from your code but I'm too lazy.)
If so, you know right off the bat that these four closest input pixels cannot be at more than a distance of 2 (because sqrt(1+1)=sqrt(2)<2). This means that the third branch of the weight computation is not applicable to the four "inner" input pixels.
You also know right off the bat the the outer input pixels (the 16-4=12 that are not discussed above) cannot be at a distance that is less than 1. This means that first branch of the weight computation is not applicable to the 12 "outer" input pixels.
Now, the weight computation for all input pixels has only two branches, instead of three. You should be able to exploit this to make the code faster. (This may require computing contributions one position at a time instead of looping. That is, getting speed out of this may require manually unrolling the loop that goes over all 16 input pixel positions.)
P.S.
This comment is not specifically about doing the unrolling here, but besides this, unless your library/compiler is really smart, you probably should organize
Code: Select all
color = mul(weights[0], float4x3(c00, c10, c20, c30));
color+= mul(weights[1], float4x3(c01, c11, c21, c31));
color+= mul(weights[2], float4x3(c02, c12, c22, c32));
color+= mul(weights[3], float4x3(c03, c13, c23, c33));
Code: Select all
color1 = mul(weights[0], float4x3(c00, c10, c20, c30));
color2 = mul(weights[1], float4x3(c01, c11, c21, c31));
color3 = mul(weights[2], float4x3(c02, c12, c22, c32));
color4 = mul(weights[3], float4x3(c03, c13, c23, c33));
color = ( color1 + color2 ) + ( color3 + color4 );
(Hopefully, I am not making incorrect assumptions about your computing environment. This is how I'd go at things if I was working with an HSLS programmer.)
P.S. I don't like playing Sudoku, but I love doing this kind of optimization puzzle
data:image/s3,"s3://crabby-images/904e0/904e0168ab918ee4c3574d031ad055e4bab3dd1e" alt="Smile :)"
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
Nicolas, for some reason, your last optimization trick actually made the code slower (119 vs 129 cycles), measured using nvshaderperf. OTOH, the Horner's rule one was very good (119 vs 143 cycles).
Maybe my Cg compiler is smart for some of these tricks already, and dumb for others.
A question for you: that jinc2 filter I made, technically, should I call it ewa-lanczos2sharp?
Maybe my Cg compiler is smart for some of these tricks already, and dumb for others.
A question for you: that jinc2 filter I made, technically, should I call it ewa-lanczos2sharp?
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
Sounds like you're running out of "registers". "Red-black" tricks (this includes my initial suggestion about min and max being computed as instead of ) generally use more memory.
If it's not too much to ask, could you try?
Code: Select all
min(min(.,.),min(.,.))
Code: Select all
min(.,min(.,min(.,.)))
If it's not too much to ask, could you try
Code: Select all
kolor = mul(weights[0], float4x3(c00, c10, c20, c30));
color = mul(weights[1], float4x3(c01, c11, c21, c31));
kolor += mul(weights[2], float4x3(c02, c12, c22, c32));
color += mul(weights[3], float4x3(c03, c13, c23, c33));
color += kolor
Last edited by NicolasRobidoux on 2014-06-08T08:06:45-07:00, edited 2 times in total.
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
It's not really what I call EWA Lanczos2Sharp because it does not use Jinc and it does not use one of my standard deblurs.Hyllian wrote: A question for you: that jinc2 filter I made, technically, should I call it ewa-lanczos2sharp?
It's a deblurred EWA Sinc-windowed Sinc 2-lobe. <- Too long for a short name.
So I don't know.
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
Sure, but no gain (113 vs 113 cycles).NicolasRobidoux wrote: If it's not too much to ask, could you try?Code: Select all
kolor = mul(weights[0], float4x3(c00, c10, c20, c30)); color = mul(weights[1], float4x3(c01, c11, c21, c31)); kolor += mul(weights[2], float4x3(c02, c12, c22, c32)); color += mul(weights[3], float4x3(c03, c13, c23, c33)); color += kolor
I couldn't get rid of jaggies using ewa-cubic. The clown image at 4x I've got is this (B=0.0, C=0.5, Catmull-Rom):
data:image/s3,"s3://crabby-images/376ca/376cabd1bb908aec11c498aea2d368d0143248a1" alt="Image"
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
EWA Catmull-Rom is super jaggy. Some people have liked it for downsampling but I've never liked it, up or down.
Try EWA RobidouxSoft:
Try EWA RobidouxSoft:
Code: Select all
B = (9-3*sqrt(2))/7 = 0.67962275898295921
C = (1-B)/2 = 0.1601886205085204
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
Very soft, indeed. A bit too blurry:NicolasRobidoux wrote:EWA Catmull-Rom is super jaggy. Some people have liked it for downsampling but I've never liked it, up or down.
Try EWA RobidouxSoft:Code: Select all
B = (9-3*sqrt(2))/7 = 0.67962275898295921 C = (1-B)/2 = 0.1601886205085204
data:image/s3,"s3://crabby-images/17cc0/17cc087d0e18ce50eb4c9225c5f06545e105ebfd" alt="Image"
I have the feeling we can't get the ewa-lanczos quality using cubic.
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
Don't give up too fast.
Let's first try Keys cubics: Once you choose B, set C=(1-B)/2.
Start with Mitchell which is the Keys with B = 1/3.
Then, vary B until you're happy with what you get.
Let's first try Keys cubics: Once you choose B, set C=(1-B)/2.
Start with Mitchell which is the Keys with B = 1/3.
Then, vary B until you're happy with what you get.
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
No dice!NicolasRobidoux wrote:Don't give up too fast.
Let's first try Keys cubics: Once you choose B, set C=(1-B)/2.
Start with Mitchell which is the Keys with B = 1/3.
Then, vary B until you're happy with what you get.
There isn't a single Keys config that comes close to this ewa-lanczos (WA=0.4, WB=0.9) quality:
data:image/s3,"s3://crabby-images/f0b3a/f0b3a897acc1066b93cdaca70ec34f6c00024570" alt="Image"
I think there is a need to derive new cubic functions that switch between x=1.1 and x=1.3, and not at 1.0 and 2.0 as is the default points. But they need to be splines (so, first derivative smooth at the switch point). I can't just chnage the swtch points using the current cubic functions, because some discontinuity will arise. Just an idea.
-
- Posts: 1944
- Joined: 2010-08-28T11:16:00-07:00
- Authentication code: 8675308
- Location: Montreal, Canada
Re: Sigmoidized Ginseng (pronounced "Jinc-Sinc") resampling
I persists in thinking that if you vary B and C (without, possibly, sticking to Keys cubics) you'll find a combination that compares.
The only thing that could make a comparable result reachable (correction: UNreachable) with 4x is that you extend your disc up to radius 2.5.
The only thing that could make a comparable result reachable (correction: UNreachable) with 4x is that you extend your disc up to radius 2.5.
Last edited by NicolasRobidoux on 2014-06-09T00:35:26-07:00, edited 1 time in total.