Post-update addressing modes are available for many 56800E instructions. At optimization level 2 and above, the compiler attempts to locate register-based address expressions which change by a linear amount for each iteration through a loop. If such an expression is located and certain conditions are met, the compiler may replace the address update expression with a post-update addressing mode that is performed concurrently with the move or arithmetic operation. Such a transformation is called `strength reduction' in compiler terminology and means replacing an instruction operation with a cheaper (fewer cycles or words) instruction. Address expressions are normally either address registers that have been loaded directly with the addresses of objects (variables) or address registers holding the calculated address of array elements. Array indices which vary by a regular, linear amount for each iteration through a loop are called `induction variables.' Many times induction variables are completely eliminated when their function is replaced by a post-update addressing mode.
X:(Rn)+ Address is incremented by 1 (2 for move.l) X:(Rn)- Address is decremented by 1 (2 for move.l) X:(Rn)+N Address is incremented by value in N register
Some programming guidelines which promote the successful targeting of the post-update addressing mode are:
In the following listing, a simple loop that calculates the sum of elements in a local array is shown. For this example, the induction variable `i' is completely eliminated because:
int i; int sum=0; int arr[] = { 13,14,18,3,7,0,1,4,11,20 }; int sz = sizeof(arr)/sizeof(int); for (i=0; i < sz; i++) sum += arr[i]; printf ( "Sum is %d\n",sum ); Assembly output: (1) adda #<10,SP ;allocate stack (2) move.w #<0,B ;sum = 0 (3) adda #-9,SP,R1 ;&arr[0]->R1 (4) moveu.w #F47,R0 ;temp F47->R0 (5) do #<10,>_L8_0 ;compiler generated init loop (6) move.w X:(R0)+,A ;initialize arr[] (7) move.w A1,X:(R1)+ (8)_L8_0: (9) adda #-9,SP,R0 ;&arr[0]->R0 (10) do #<10,>_L8_1 ;for loop (11) move.w X:(R0)+,A ;arr[i]->A (12) add A,B ;sum = arr[i]+sum (13)_L8_1: (14) adda #<2,SP ;printf call setup (15) moveu.w #@lb(F54),N ;string temp to stack (16) move.w N,X:(SP) (17) move.w B1,X:(SP-1) ;sum to stack (18) jsr >Fprintf ;call printf (19) suba #<2,SP ;restore stack
The following listing shows a case where strength reduction of the address expression was not possible, mainly because the access to the array is conditionally executed in the loop. Also, the induction variable `i' is used in the `if' test, but this would not normally prevent a post-update transformation from occurring.
for (i=0; i < sz; i++) if ( i & 1 ) sum += arr[i]; Assembly output: (1) do #<10,>_L8_1 ;for loop (2) brclr #1,Y0,<_L8_2 ;if ( i & 1 ) (3) move.w X:(R0),A ;arr[i]->Av (4) add A,B ;sum = arr[i]+sum (5)_L8_2: (6) adda #<1,R0 ; &arr = &arr + 1; (7) add.w #<1,Y0 ; i = i + 1 (8) nop (9)_L8_1:
In the following listing, another situation is shown where strength reduction will fail to find a post-update opportunity. This is when the loop or induction variable is multiply defined in a loop.
for (i=0; i < sz; i++) sum += arr[i++]; Assembly output: (1) move.w #<0,A ; i=0 (2)_L8_1: (3) move.w A1,B ; i -> temp (4) add.w #<1,B ; temp++ (5) move.w A1,N ; temp++ -> N (6) adda #-9,SP,R0 ; &arr[0] -> R0 (7) move.w X:(R0+N),A ; arr[temp++] -> A (8) add A,Y0 ; sum = arr[i++] + sum (9) move.w B1,A ; temp++ -> i (10)add.w #<1,A ; i = i + 1 (11)cmp.w #<10,A (12)blt <_L8_1 ; i < 10 ?
The following listing demonstrates a simple delay line loop that is structured so post-update addressing is impossible. The final store to memory in the loop is a memory plus displacement addressing mode, move.w A1,X:(R0+1), which doesn't allow post-update addressing. The loop written as is takes approximately 29 cycles and 9 words for NTAPS=6.
for (ii = NTAPS - 2; ii >= 0; ii--) { z[ii + 1] = z[ii]; } Assembly output: (1) do #<5,>_L12_1 ; for () (2) move.w Y0,R0 ; ii -> R0 (3) adda R3,R0 ; &z[0] + i (4) move.w X:(R0),A ; z[ii] -> A (5) move.w A1,X:(R0+1) ; z[ii] -> z[ii + 1] (6) sub.w #<1,Y0 ; ii-- (7)_L12_1:
The loop in the above listing may be re-written slightly as shown in the following listing to allow for much more efficient processing. The idea is to try to get an instruction that has a post-update variant as the final load or store in the loop. This loop executes in 17 cycles and 8 words.
int *p1 = &z[NTAPS-1]; for (ii = NTAPS - 2; ii >= 0; ii--) { *p1-- = z[ii]; } Assembly output: (1) tfra R1,R3 ;&z[NTAPS-1] -> R3 (2) adda #-5,SP,R0 ;&z[NTAPS-2] -> R0 (3) tfra R0,R2 ;R0 -> R2 (4) do #<5,>_L9_1 ;for () (5) move.w X:(R2)-,B ;z[ii] -> B (6) move.w B1,X:(R3)- ;B -> z[ii+1] (7)_L9_1: