Biggest patent portfolios by company
by company
- INTERNATIONAL BUSINESS MACHINES CORPORATION 13,899
- CANON KABUSHIKI KAISHA 9,693
- NEC CORPORATION 6,843
- SAMSUNG ELECTRONICS CO., LTD. 6,726
- KABUSHIKI KAISHA TOSHIBA 6,682
- SONY CORPORATION 6,195
- HITACHI, LTD. 5,935
- FUJITSU LIMITED 5,841
- MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD. 5,735
- MITSUBISHI DENKI KABUSHIKI KAISHA 5,253
Biggest patent portfolios by inventor
by inventor
- Silverbrook Kia 1,860
- Yamazaki Shunpei 1,585
- Satake Toshihiko 905
- Yamamoto Hiroshi 766
- WATANABE HIROSHI 753
- Weder Donald E. 657
- Forbes Leonard 618
- Tanaka Hiroshi 585
- Suzuki Takashi 575
- Takahashi Hiroshi 570
Patent appraised by patentsbase
$ 0GLOBAL PATENTRANK
# 56.000ABSTRACT
A programmable address arithmetic unit and method for use on microprocessors, microcontrollers, and digital signal processors is described. The addressing arithmetic unit incorporates a programmable logic array or other programmable device coupled to address registers and the instruction stream, the address unit being responsive to commands in the processor's instruction set. A first set of instructions control the initialization and configuration of the address arithmetic unit logic. A second set of instructions reference operands using one or more addressing modes that calculate the operand's effective address using the logic programmed by said first set of instructions.
INFORMATION
DETAILED DESCRIPTION OF THE INVENTION
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention improves upon the prior art by supplementing the fixed address arithmetic units of the prior art with a programmable address arithmetic unit (programmable AAU). A programmer using the programmable AAU can create new addressing modes as needed to efficiently implement new signal processing algorithms without having to wait for DSP hardware designs to “catch up” with algorithm development. For example, in the prior art an AAU could be used by an instruction to provide an auto-increment addressing mode to automatically increment by some fixed amount a memory address stored in an address register. In an auto-increment addressing mode, each time the memory address contained in the address register is used, say in a store operation, the address is automatically incremented by some fixed amount. Using an auto-increment addressing mode, it is relatively easy for a programmer to generate a sequence of addresses such as 0, 1, 2, 3, etc.
Using the present invention, the AAU, which is the logic block that increments the address, is either replaced or supplemented by a programmable AAU (programmable AAU). The programmable AAU can be programmed by a programmer such that an auto-update instruction will update (e.g. increment) an address according to the needs of programmer desiring to implement a new signal processing algorithm. For example, in implementing a specific algorithm, a programmer may require that an address increment forward by eight on every even clock cycle, backwards by four on every odd clock cycle, and jump forward an additional four on every fourth clock cycle, thus generating the sequence of addresses 0, 8, 4, 12, 12, 20, 16, etc. In the prior art, generating this sequence required many DSP instructions and many clock cycles. With the present invention, a programmer can program the address arithmetic unit to automatically generate the desired sequence such that repetitive use of an auto-update type of instruction (e.g. STORE R,R++) generates the desired sequence. In a preferred embodiment, a new address in the sequence can be generated in one clock cycle.
Another example of auto-increment type of address arithmetic is found in the bit reversal addressing modes that are commonly used to implement FFT algorithms. Bit reversal, loosely speaking, provides for reversing the order of the least significant bits in an address register. Given an 8-bit register containing the value 01001000b (the b suffix meaning binary), reversing the order of the lowest four bits in the register gives the value 01000001b. With a programmable AAU, as disclosed herein, a DSP programmer can provide for other bit rearrangement schemes such as rotating the lowest four bits by two, which would turn 01001000b into 01000010b. Such a rotation is not provided by present DSP architectures, but is very useful when performing matrix calculation as shown below in the text relating to FIGS. 4A-4C.
A DSP program may access (use) the addressing functions programmed into the programmable AAU by using existing auto-update addressing modes of the DSP. Alternatively, a DSP program may access the programmable AAU addressing functions through new auto-update addressing modes provided by the DSP specifically for accessing the programmable AAU. In the latter case, new assembler mnemonics are advantageously provided for writing programs that use the new addressing modes. In a preferred embodiment, the mnemonics “*+” and “*−” refer to auto-increment and auto-decrement addressing modes and the new assembler mnemonic “*++” refers to a programmable auto-update mode provided by the programmable AAU. Alternatively, the programmable AAU may advantageously provide a plurality of programmable auto-update modes accessed by an instruction that selects the “current” programmable address mode from the plurality of programmable auto-update modes. In a preferred embodiment, the current programmable addressing mode is specified by a “select address mode” instruction with an assembler mnemonic “SAM” such that a current programmable addressing mode number two is specified as “SAM 2”. Alternatively, the programmable AAU may advantageously provide a plurality of programmable auto-update modes accessed by a plurality of new assembler mnemonics (e.g. “*++*1”, “*++*2”, etc.).
Operation of the programmable AAU is most easily explained by comparing a DSP without the programmable AAU to a DSP with the programmable AAU. FIG. 1 is a block diagram that illustrates a conventional DSP having a conventional (non-programmable) AAU . The AAU may be a simple arithmetic logic unit that provides addition and shifting operations and, optionally, may provide more complex operations such as bit-reversal. A register set comprising address registers AR0, AR1, . . . ARn provides data to an input of the AAU . A bi-directional data path connects the register set to a data bus . An output of the AAU also provides data to the register set . A second output of the register set provides data to a first input of an address multiplexer . A second input of the address multiplexer receives data from a program bus . An output of the address multiplexer is provided to an address input of a data memory . The data memory is provided with a bi-directional data path to the data bus . The data bus provides data to an input of a multiply-accumulator and an output of the multiply-accumulator (MAC) provides data back to the data bus . The data bus also provides data to an input of an address register pointer . An output of the address register pointer selects a register from the register set to use as the default address register. A bi-directional data path connects the data bus with the program bus . An output of a program counter provides an address to a program memory which, in turn, provides an instruction to the program bus . An instruction decoder receives data from the program bus .
The processor is typical of high performance digital processors that are based on a Harvard-type architecture. The Harvard architecture improves processing throughput by maintaining two separate memory bus structures: the program bus and the data bus . The present invention may also be used with other architectures. A program is stored in the program memory as a sequence of instructions. The program data bus carries the instruction code and immediate operands from the program memory . Program data is stored in the data memory and carried by the data bus which interconnects various elements, such as the MAC , the register set .
The MAC , comprising a multiplier, an Arithmetic Logic Unit (ALU), an accumulator, and one or more shifters, is the primary arithmetic computational unit of the processor . The ALU is a general-purpose arithmetic unit which can perform operations such as add, subtract, Boolean logic, and shifting operations. A first input of the ALU receives data from either from the data bus , from the program bus (e.g. during immediate instructions which have data as part of the instruction), or from the multiplier. The accumulator stores the output from the ALU and also provides data to a second input of the ALU. The multiplier provides two's complement multiplication in a single instruction cycle.
The processor uses the program counter to step through a sequence of program instructions stored in the program memory . To fetch a program instruction, the processor increments the program counter . The instruction referenced by the program counter is fetched from the program memory and placed on the program bus . The decoder receives the instruction data from the program bus and decodes the instruction. The decoder then directs the other elements of the processor to perform the functions specified by the instruction. For example, when the instruction on the program bus is the “STORE AR1,*+” instruction discussed above, the processor performs the following actions (assuming that AR0 is the default address register): (1) set the multiplexer to select an address from the default address register (e.g. AR0), (2) put the data in AR1 on the data bus , (3) store the data on the data bus in the data memory at the address specified by the default address register, (4) use the AAU to increment the address in the default address register. These four steps may be performed during multiple clock cycles, or one or more of the steps may be performed during a single clock cycle.
FIG. 2 illustrates one embodiment of a processor that provides programmable address calculation using a programmable AAU . The processor provides all of the elements of the processor , except that the output of the AAU is not provided directly back to the register set , but rather, the output of the AAU is provided to a first input of a multiplexer . The processor also provides the programmable AAU which receives input data from the register set . An output of the programmable AAU is provided to a second input of the multiplexer . An output of the multiplexer is provided to the register set .
The programmable AAU may be a programmed logic array (PLA), a field programmable gate array (FPGA), a micro sequencer, or any other programmable function block. The programmable AAU is responsive to the instruction set of the processor and to the data in the registers set . The programmable AAU produces an output which is a function of the input from the register set and programming information stored in an AAU program memory in the programmable AAU . The programmable AAU may contain combinatorial logic as well as internal registers and feedback paths to implement sequential logic functions. By including the data paths shown in FIG. 2, addresses stored in the address register set can be manipulated efficiently in hardware without the need for complex indexing software, and without the need for specialized hardware. For example, the programmable AAU can be programmed to provide automatic address indexing for FFT processing, Viterbi decoding, discrete cosine transforms, circular buffers, etc.
In a preferred embodiment, the programmable AAU is user programmable and incorporates a memory to store the programmable AAU program. The memory may be any memory technology, including Random Access Memory, an Eraseable Programmable Read Only Memory, an Electrically Eraseable Programmable Read Only Memory, a Programmable Read Only Memory, Fuseable Links, or Anti-Fuses. To program the programmable AAU , data is loaded into the memory . In one embodiment, a special processor instruction is used to insert data into the memory . In an alternate embodiment, the memory is mapped into the memory space of the program memory or the data memory such that the programmable AAU memory can be programmed simply by writing to the mapped memory locations. In yet another embodiment, the programmable AAU memory is loaded by a direct memory access operation. The memory may be eraseable and rewriteable such that a programmer can modify the program stored in the memory or the memory may be write-once memory which cannot be changed once a program has been loaded.
Crossbar Embodiment
FIG. 3A illustrates a functional block diagram of one embodiment of the programmable AAU . The embodiment illustrated in FIG. 3A is a programmable AAU which is configured as a cross-bar switch that accepts a 16-bit input word and rearranges the bits therein to produce a 16-bit output word . The programmable AAU is programmable such that the bits can be rearranged in any order. For purposes of illustration, the programmable AAU is shown in FIG. 3A as being programmed to reverse all of the bits in the 16-bit word (as shown in FIG. B). The words and each consist of 16-bits labeled bit (least significant) through bit (most significant). The programmable AAU has 16 horizontal lines, each line corresponding to a bit in the input byte , and 16 vertical lines, each line corresponding to a bit in the output byte . A connection between a vertical line and a horizontal line is indicated by a dot at the junction between the lines. The programmable AAU is programmed by loading data into a program register (PR0) , a program register (PR1) , a program register (PR2) , and a program register (PR3) . Each of the program registers PR0-PR3 - is one word (sixteen bits). In this embodiment, the AAU memory comprises the program registers PR0-PR3 -. Bits - of the register PR0 determine which bit in the input word is mapped to bit in the output word , bits - of the register PR0 determine which bit in the input word is mapped to bit in the output word , etc. This sequence is continued through all of the bits in the register PR0. The sequence is also continued in the register PR1-PR3 -. Thus, bits - of the register PR1 determine which bit in the input word is mapped to bit in the output word , and so forth. As shown, the programmable AAU can map any bit in the input word to any bit in the output word . For example, to simply map the input bits directly to the corresponding output bits (input bit to output bit , input bit to output bit , etc.), the registers PR0-PR3 would be loaded as follows: PR3=FEDCh (where “h” indicates hexadecimal notation), PR2=BA98h, PR1=7654h, PR0=3210h. To perform a bit reversal, as shown in FIG. 3B, where input bit is mapped to output bit , input bit is mapped to output bit , etc., the registers are loaded as follows: PR3=0123h, PR2=4567, PR1=89ABh, PR0=CDEFh.
The programmable AAU thus provides the capability to programmably permutate the bits in a 16-bit word. Using the programmable AAU as the programmable AAU in the processor shown in FIG. 2, allows a programmer to programmably permutate the bits in the address registers AR0-ARn in the register set . The ability to arbitrarily permutate the bits in an address register provides many capabilities not seen in the prior art. For example, as shown in FIGS. 3C and 3D, the programmable AAU can be programmed to provide an output which consists of a two bit rotate on only the four least significant bits in the input word. To accomplish this, the registers PR0-PR3 - are loaded as follows: PR3=FEDCh, PR2=BA98h, PR1=7654, PR0=1032. This function is useful for generating addresses in a matrix when performing matrix arithmetic.
FIG. 4A illustrates a matrix “A” having sixteen elements arranged in four rows and four columns. The elements of the matrix A are denoted A(0,0), A(0,1) . . . A(i,j) . . . A(3,3), where the “i” index indicates the row and the “j” index indicates the column FIG. 4B illustrates how the matrix would typically be stored in memory. As shown in FIG. 4B, in memory, the sixteen elements of the matrix are laid out sequentially in memory starting at an address 32d (where the “d” indicates decimal) or 0020h through 47d (002Fh). Note that the twelve most significant bits of the memory address for each element in the matrix are always the same and that the lowest four bits of the memory address for each element correspond to the “distance” from the first element. Thus, as shown in FIG. 4C, the memory address of an element can be interpreted as the combination of a base address and a displacement (or index). Any square matrix having 2n elements per row will have an index field that is 2n bits wide. The address of an element in such a matrix may be expressed in base-index form similar to FIG. 4C as long as the matrix is stored in memory at a location where the index of the first element of the matrix is zero.
It is common in computer programs that deal with matrices, such as the matrix , to interchange elements within the matrix. One very common type of interchange is the transpose wherein A(i,j) is interchanged with A(j,i) (e.g. interchange A(1,3) with A(3,1)). Typically, a programmer needing to transpose elements will generate the address of one of the elements to be transposed (e.g. the address of A(1,3)), and then use that address to compute the address of the transpose element (e.g. A(3,1)). Addressing arithmetic is needed to efficiently compute the address of A(j,i) since the matrix is actually stored as a linear array in memory. Two additions and a multiply are needed to compute the transpose address using standard techniques. If the matrix size is a power of two, then two additions and a barrel shift are required. Since the AAU on a typical DSP, such as the TMS320C2x, does not support multiplies and shifts, multiple clock cycles are needed to compute the address in the conventional way.
Table 1 lists the index portion of the memory address for each element in the matrix which can be transposed. Each line of Table 1 contains one transpose pair. Examination of the indices in Table 1 reveals that the indices of each transpose pair are related by a two bit rotate, either left or right. Given the index for one element, the index for the transpose element can be computed by a simple two bit rotate. Recall, however, that the base portion of the memory address of each matrix element is the same. Thus, to generate the address of the transpose element, only the lowest four bits (the index portion) of the address are rotated.
Table 2 lists the assembly code and the number of clock cycles used to directly convert a matrix element address to a transpose address on the TMC320C2x processor. Since the TMC320C2x processor does not provide an instruction to rotate only the lowest four bits of a register, the code in Table 2 requires many instructions to mask off portions to be rotated, save the masked portions, perform the rotate, etc. As shown in Table 2, generating the transpose address uses fourteen clock cycles. (One skilled in the art will realize that for a small matrix, such as the matrix , the code in Table 2 could be streamlined by using a lookup table, however, a lookup table would be as large as the matrix, and thus the lookup table quickly becomes impractical as the size of the matrix increases. The code in Table 2 is therefore considered to be more representative of real world code for a medium to large sized matrix.)
By contrast, the programmable AAU shown in FIG. 3C can calculate the transpose address in one clock cycle. Table 3 list the code to compute the transpose address using a TMS320C2x to which the programmable AAU has been added, as shown in FIG. . The code in Table 3 assumes that the PR0-3 registers of the programmable AAU are mapped into the program memory of the processor. Lines 6-9 in Table 3 load the registers PR0-3. Alternatively, the TMS320C2x processor could be modified by the addition of a new instruction to load the PR registers. For example, an instruction “LPR PR0, >1032” could be designed to load the register PR0 with the value 1032h. The only executable statement in Table 3 is the MAR instruction in line 12. Comparing the code in Table 2 with the code in Table 3 shows that the programmable AAU reduces the time needed to compute the transpose address from 14 clock cycles to 1 clock cycle.
In many cases, the one clock cycle MAR instruction shown in the macro in line 12 of Table 3 is not needed because the address can be computed using an auto-update address mode. An auto-update address mode can be accessed from most instructions in the DSP instruction set. For example, consider the swap operation wherein a matrix element and its transpose element are interchanged (e.g. the value of A(1,3) is interchanged with the value of A(3,1)). Table 4 shows the code needed to perform the swap using the TMS320C2x assuming the auxiliary register pointer is initially set to zero. The swap requires eighteen cycles, 14 cycles to generate the transpose address and four cycles to do the actual swap. The code also requires four registers, one to hold the address, one to hold the transpose address, and two to hold the data.
By contrast, Table 5 lists the code to perform the swap operation on a TMS320C2x to which the programmable AAU has been added. The TMS320C2x assembler uses the notation “*+” to indicate auto-increment addressing. The new “*++” notation in Table 5 is similar to the auto-increment notation “*+” except that the *++is intended to instruct the processor to use the programmable AAU . Assuming that the registers PR0-PR3 have been properly loaded, the programmable AAU will compute the transpose address. Thus, the swap operation in Table 5 requires only four clock cycles because the calculation of the transpose addresses can be performed simultaneously with the load and store operations. Further, the code in Table 5 only uses three registers, one to hold the addresses and two to hold the data. Only one register is needed to hold the addresses because on each instruction, the address in the register is converted into the address of the corresponding transpose element.
Programmable Arrays
The programmable AAU (or ) is a very simple example of a programmable logic unit. In another embodiment of the present invention, the programmable AAU is a Programmable Logic Array (PLA), Programmable Logic Device (PLD), or a Field Programmable Gate Array (FPGA) which can be user programmed, like a RAM or ROM. PLAs and PLDs are typically devices that provide an output which is some combinatorial logic function of the input. These devices typically provide an output which can be expressed as a sum-of-products (AND-OR) of the inputs. In other words, in a PLA or PLD, each input is fed to a collection of AND gates. The outputs of these AND gates are OR'ed together to form the outputs.
FIG. 5 shows a simple example of a PLA which could be used as the programmable AAU in FIG. . The PLA can perform the bit rearranging operation provided by the cross-bar switch AAU shown in FIG. 3, and, the PLA can provided higher level combinatorial functions not provided by the cross-bar switch. The PLA has sixteen inputs A-A, comprising input lines - and sixteen outputs B-B comprising output lines -. The output line is driven by an output of a three input OR gate . The first input of the OR gate is provided by an output of a four input AND gate , the second input of the OR gate is provided by an output of a four input AND gate , and the third input of the OR gate is provided by an output of a four input AND gate . The four input lines of the AND gates , and cross all of the input lines - thereby creating a plurality of intersections. Each time one of the AND gate input lines crosses one of the input lines -, an intersection is created. For example, the first input line of the AND gate is a line . The line crosses over all of the input lines -. The point where line crosses the input line creates an intersection and the point where the line crosses the input line creates an intersection . Whether or not the intersection connects the line to the line is determined by how the user programs the PLA. The user may program the PLA such that the intersection connects the line to the line . Alternatively, user may program the PLA such that the intersection does not connect the line to the line . All of the intersections may be similarly programmed. Thus, in the PLA , the user may program any output Bi as follows:
ijklmnopqrstuvwxy (1)
where i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y=0 . . . 15
The simple PLA of FIG. 5 is used to illustrate the use of a PLA as a programmable AAU. More complicated PLA structures are known in the art and the use of other PLA, Programmable Array Logic (PAL), or PLD devices as a programmable AAU are within the scope of the present invention.
FPGAs are similar to PLAs, PALs, and PLDs, but FPGAs are complex enough to implement more than simple combinatorial logic. Complex designs including combinatorial and sequential logic with up to several thousands gates and latches may be implemented in an FPGA. To efficiently exploit the logic capacity of FPGAs, synthesis tools and efficient synthesis methods for FPGAs are desirable. FPGA designs can be described either with schematic layout tools or using synthesis from a hardware description language model such as VHDL-1076. VHDL-1076 (VHSIC (Very High Speed Integrated Circuits) Hardware Description Language) is a programming language for describing hardware circuits. VHDL has been an Institute of Electrical and Electronics Engineers (IEEE) standard since 1987. VHDL is “a formal notation intended for use in all phases of the creation of electronic systems. . . . it supports the development, verification, synthesis, and testing of hardware designs, the communication of hardware design data . . . ” [Preface to the IEEE Standard VHDL Language Reference Manual] and especially simulation of hardware descriptions. VHDL-models are a DoD requirement for vendors. Simulation systems and other tools such as synthesis, verification and others based on VHDL are available.
By using synthesis tools, the modeling, verification and implementation processes of programming an FPGA can be easily accomplished. The major advantage of synthesis-based designs is that the same hardware description language code can be used for verification and implementation. This integrated design flow reduces the amount of code that has to be maintained and the risk of inconsistencies between different models. Once the functional correctness of the FPGA program has been proved, the same code can be usable to generate a hardware implementation. Ideally, this process would require only recompilation of the VHDL program with a silicon compiler to program or reprogram an address mode into hardware, if desired.
Some FPGAs contain special circuitry to implement common arithmetic functions such as add, subtract, shift, etc. When such special circuitry is provided, a special software library is advantageously provided to help a programmer in using these special functions. In a preferred embodiment of the present invention, the programmable AAU has special purpose hardware to efficiently implement fast carry logic as found in adders, subtractors, counters and other related function blocks, and is thus able to make use of such special libraries. Such software design tools are available, for example, from XILINX Inc.
One skilled in the art will recognize that combining the FPGA with a DSP allows the program developer to, in effect, create a customized DSP without resorting to custom hardware. The customized DSP can provide special functions to perform address calculations for a specific algorithm as programmed by the programmer. Instead of the simple cross-bar switch described above in connection with FIG. 3, implementing the programmable AAU as a FPGA allows a programmer to implement very complex address calculation algorithms. To implement a desired algorithm (e.g. an FFT) that uses the FPGA , a developer will typically first program the FPGA . The FPGA is programmed by writing an FPGA program in a language such as VHDL. Once the FPGA program is written, the developer writes a DSP program in an appropriate language (e.g. C or assembler) for the DSP. The program for the DSP will use DSP instructions that access functions programmed into the FPGA. To run the algorithm (e.g. the FFT) the developer will load the compiled VHDL program into the memory , load the DSP program into the program memory , and start the DSP program.
In yet another embodiment, the programmable AAU is an FPGA which is programmed not by manually writing code, but rather by a compiler that reads the source code of a DSP program and generates code for the programmable AAU, thereby relieving the developer of the task of programming the FPGA. In yet another embodiment, the programmable AAU is programmed by a software module which “watches” the sequence of addresses generated by a DSP program and then writes code for the programmable AAU to generate that same sequence of addresses.
FIG. 6, comprising FIGS. 6A and 6B, is a block diagram of a Very Long Instruction Word (VLIW) DSP with a first programmable AAU and a second programmable AAU . In the VLIW DSP , an output of a fetch register provides a VLIW to a dispatch unit . The dispatch unit decodes the VLIW and dispatches instructions and data to functional units , , , , , , , and . The functional units , , and send data to, and receive data from, a first register file . The functional units , , and send data to, and receive data from, a second register file . The first and second register files and also send data to, and receive data from a data memory controller which controls data accesses to a data memory . Addresses to the data memory are provided by a first address multiplexer and a second address multiplexer . A first input of the first address multiplexer is provided by the functional unit and a second input of the first address multiplexer is provided by the functional unit . A first input of the second address multiplexer is provided by the functional unit and a second input of the second address multiplexer is provided by the functional unit . An output of the first multiplexer is provided to the data memory by a first address bus . An output of the second multiplexer is provided to the data memory by a second address bus . The programmable AAUs and each comprise an internal memory. The programmable AAU receives data from the functional unit and provides data to a third input of the multiplexer . The programmable AAU receives data from the functional unit and provides data to a third input of the multiplexer .
A VLIW DSP is a form of parallel processor wherein a Very Long Instruction Word (VLIW), comprising several instructions, is fetched and decoded into separate instructions. The instructions decoded from the VLIW are passed to multiple functional units which may operate in parallel. The DSP fetches a VLIW from a program memory or cache into the fetch register . The fetch register provides the VLIW to the dispatch unit which decodes the VLIW into instructions and data for each of the functional units. The functional units operate on immediate data from the VLIW and on data stored in the first register file and the second register file . Data is loaded from the program memory into the registers in the register files , by a load instruction. Data is stored from the register files , into the program memory by store instructions. Addresses for the load and store instructions are computed by the functional units and . Typically, addresses are stored in the register files , and address calculations are performed by the functional units and and the programmable AAUs and . As in the previous embodiments, the programmable AAUs and may be programmed by a programmer to provide new addressing modes.
FIG. 7 is a block diagram that illustrates one embodiment of the data paths used to program a programmable AAU having an internal memory or program register set. Program data is provided to the programmable AAU on a data bus . An address bus provides an address to the programmable AAU . A write strobe is also provided as an input to the programmable AAU . An optional serial input bus is also provided to the programmable AAU .
Data is loaded into the programmable AAU by placing the desired data onto the data bus and placing an address on the address bus . The data is then clocked onto the programmable AAU memory by asserting the write strobe . Alternatively, data may programmed into the programmable AAU using the serial line and one of the many serial line protocols known in the art. The programmable AAU may also be programmed under software control by a special DSP instruction. Alternatively, the data on the data bus and the address on the address bus may be provided under hardware control, either at boot time or during normal operation. Alternatively, the programmable AAU may be programmed by mapping the programmable AAU memory into the normal memory address space of the DSP, or by mapping the programmable AAU memory to an input/output port of the DSP.
Other Embodiments
Although the present invention has been described with reference to a specific embodiment, other embodiments occur to those skilled in the art. It is to be understood therefore, that the invention herein encompasses all such embodiments that do not depart from the spirit and scope of the invention as defined in the appended claims.
BRIEF DESCRIPTION OF THE FIGURES
The various novel features of the invention are illustrated in the figures listed below and described in the detailed description which follows.
FIG. 1 is a block diagram that illustrates the elements of a typical Digital Signal Processor that provides address calculation using an Address Arithmetic Unit.
FIG. 2 is a block diagram that illustrates a Digital Signal Processor which provides programmable address computation using a programmable Address Arithmetic Unit.
FIG. 3A illustrates a functional block diagram of a programmable AAU which is implemented as a cross-bar switch that reverses all of the bits in a 16-bit word.
FIG. 3B illustrates the bit reversal process provided by the programmable AAU in FIG. A.
FIG. 3C illustrates a functional block diagram of a programmable AAU which is implemented as a cross-bar switch that rotates the four low order bits of a 16-bit word.
FIG. 3D illustrates the bit rotate process provided by the programmable AAU in FIG. C.
FIG. 4A illustrates a matrix “A” having four rows and four columns.
FIG. 4B illustrates one example of how the matrix of FIG. 4A may be stored in memory.
FIG. 4C illustrates a base field and an index field of a memory address for the matrix storage map of FIG. B.
FIG. 5 is a block diagram that illustrates one embodiment of a Programmable Logic Array (PLA).
FIG. 6 is a block diagram that illustrates programmable address computation using a programmable address arithmetic unit in a very long instruction word architecture.
FIG. 7 is a block diagram that illustrates the data paths used to program a programmable address arithmetic unit having an internal memory.
REFERENCE TO RELATED APPLICATIONS
The present application is a divisional of U.S. application Ser. No. 09/022,285 entitled “Processor with Programmable Addressing Modes” filed Feb. 11, 1998, now U.S. Pat. No. 6,163,836, which claims priority benefit of U.S. provisional application Serial No. 60/054,471 filed Aug. 1, 1997.
CLAIMS
1. A method for programming a programmable system comprising a fixed processor portion and a user programmable address arithmetic unit, the method comprising: writing a first program in a first programming language, said first program configured to implement one or more user-defined address calculation functions in said programmable address arithmetic unit; compiling said first program; generating a first executable image, said first executable image adapted for loading into a first memory coupled to said programmable address arithmetic unit; writing a second program in a second programming language, said second program configured to implement a desired algorithm, wherein the second programming language comprises a fixed set instructions that make use of a fixed set of addressing modes, wherein at least one of the addressing modes comprises a user defined addressing mode; compiling said second program into object code, said object code comprising a plurality of machine level instructions of a processor, said plurality of machine level instructions comprising at least one instruction that invokes the user defined addressing mode; and generating a second executable image, said second executable image adapted for loading into a second memory coupled to said processor.
2. The method of claim 1, wherein said address calculation function is invoked by an auto-update addressing mode.
3. The method of claim 1, wherein said first programming language is a hardware definition language.
4. The method of claim 3, wherein said second programming language is an assembly language.
5. The method of claim 3, wherein said second programming language is a high level programming language.
6. The method of claim 1, wherein said programmable address arithmetic unit comprises special purpose circuitry, said method further comprising: adapting said first program to use a software library to access said special purpose circuitry.
7. A computer-readable medium containing a first software module having a sequence of instruction drawn from a fixed set of instructions to implement an algorithm using a processor having a fixed portion and a programmable addressing arithmetic unit (PAAU), and a second software module containing configuration codes which define the operation of a user-defined addressing mode, said first and second modules implementing the method of: executing said instructions in said first software module to implement said algorithm; and using said configuration codes to configure the operation of said user-defined addressing mode to be executed in the PAAU; wherein at least one instruction in said first module references an operand using said user-defined addressing mode.
8. The medium of claim 7, whereby at least some of said instructions in said first software module are used to program a digital signal processor.
9. The medium of claim 7, whereby said at least one instruction causes an auto-update to be applied to a pointer operand, and the operation of the auto-update is defined by said user-defined addressing mode.
10. The medium of claim 9, whereby an assembly language mnemonic is used to specify said auto-update.
11. The medium of claim 10, whereby said first software module comprises a second instruction, said second instruction specifying one of a plurality of user-defined addressing modes to be selected to define the operation of said auto-update.
12. The medium of claim 7, whereby at least some of said configuration codes in said second software module are used to configure a programmable logic block within said PAAU.
13. The medium of claim 7, whereby at least some of said configuration codes in said second software module are used to program a set of sequential logic operations as implemented by a microsequenced state machine within said PAAU.
14. The medium of claim 7, whereby said configuration codes in said second software module are used to program a crossbar switching element within said PAAU.
15. The medium of claim 7, whereby said first software module comprises a plurality of subsets of instructions, each subset of instructions to be dispatched to one of a plurality of functional units in a multi-issue processor, whereby one of said subsets is dispatched to a functional unit comprising said PAAU.
16. A method of executing software in a computerized system, said system comprising a fixed processor portion and a programmable addressing unit (PAAU), a first software module containing a sequence of instructions drawn from a fixed set of instructions to implement an algorithm using a processor having a fixed portion and the PAAU, and a second software module containing configuration codes defining the operation of a user-defined addressing mode supplied by said PAAU, the method comprising: executing instructions in said first software module to implement an algorithm; and using said configuration codes to configure the operation of said user-defined addressing mode to be executed in the PAAU; whereby at least one instruction in said first module references an operand using said user-defined addressing mode.
17. The method of claim 16, further comprising using at least some of said instructions in said first software module to program a digital signal processor.
18. The method of claim 16, further comprising: defining the operation of the auto-update by said user-defined addressing mode; and causing an auto-update to be applied to a pointer operand using said at least one instruction.
19. The method of claim 18, further comprising said auto-update using an assembly language mnemonic.
20. The method of claim 19, whereby said first software module comprises a second instruction, and said method further comprises specifying one of a plurality of user-defined addressing modes to be selected to define the operation of said auto-update using said second instruction.
21. The method of claim 16, further comprising using at least some of said configuration codes in said second software module to configure a programmable logic block within said PAAU.
22. The method of claim 16, further comprising using at least some of said configuration codes in said second software module to program a set of sequential logic operations as implemented by a microsequenced state machine within said PAAU.
23. The method of claim 16, further comprising using said configuration codes in said second software module to program a crossbar switching element within said PAAU.
24. The method of claim 16, whereby said first software module comprises a plurality of subsets of instructions, each subset of instructions to be dispatched to one of a plurality of functional units in a multi-issue processor, the method further comprising dispatching at least one of said subsets to a functional unit comprising said PAAU.
25. A computerized system adapted for loading a first software module having a plurality of instructions drawn from a fixed instruction set and a second software module having at least one configuration code, the system comprising: a processor having a fixed portion and a programmable addressing arithmetic unit (PAAU); wherein said processor is adapted to: (i) execute at least a first one of said plurality of instructions to at least partially implement an algorithm, (ii) configure a user-defined addressing mode in said PAAU using said at least one configuration code, and (iii) execute at least a second one of said plurality of instructions, said at least second instruction referencing an operand using said user-defined addressing mode, said second instruction also being executed to at least partially implement said algorithm.
26. The system of claim 25, whereby at least some of said instructions in said first software module are used to program a digital signal processor.
27. The system of claim 25, whereby said at least one instruction causes an auto-update to be applied to a pointer operand, and the operation of the auto-update is defined by said user-defined addressing mode.
28. The system of claim 27, whereby an assembly language mnemonic is used to specify said auto-update.
29. The system of claim 28, whereby said first software module comprises a second instruction, said second instruction specifying one of a plurality of user-defined addressing modes to be selected to define the operation of said auto-update.
30. The system of claim 25, whereby at least some of said configuration codes in said second software module are used to configure a programmable logic block within said PAAU.
31. The system of claim 25, whereby at least some of said configuration codes in said second software module are used to program a set of sequential logic operations as implemented by a microsequenced state machine within said PAAU.
32. The system of claim 25, whereby said configuration codes in said second software module are used to program a crossbar switching element within said PAAU.
33. The system of claim 25, whereby said first software module comprises a plurality of subsets of instructions, each subset of instructions to be dispatched to one of a plurality of functional units in a multi-issue processor, whereby one of said subsets is dispatched to a functional unit comprising said PAAU.
34. A computer-implemented method for programming a processor comprising a programmable addressing arithmetic unit (PAAU), the method comprising: allowing a sequence of instructions defining a program for implementing an algorithm to execute, thereby generating a sequence of addresses; observing at least a subsequence of said sequence of addresses; and generating a configuration program for said PAAU, said configuration program defining a user-defined auto-update addressing mode, wherein successive executions of said user-defined auto-update addressing mode is operative to regenerate said subsequence.
35. The method of claim 34, whereby said act of observing comprises observing a subsequence that corresponds to an address history sequence of a pointer variable.
36. The method of claim 35, further comprising defining an auto-update operation using said user-defined addressing mode, said auto-update defining at least one method for advancing from a current address element of said subsequence to a successive address-element of said subsequence.
37. The method of claim 36, further comprising: modifying said sequence of instructions by inserting into at least a subset of specified instructions a mnemonic that specifies said auto-update operation, such that when said modified sequence of instructions is executed, said pointer undergoes said observed subsequence of addresses.
38. The method of claim 37, further comprising executing said sequence of instructions in N cycles and said modified set of instructions in M cycles, with the value of M being less than that of N.
39. A method of executing software in a computerized system, said system comprising a means for processing digital data, said means for processing comprising a fixed processor portion and programmable means for addressing, a first software module containing instructions defining the operation of an algorithm and a second software module containing configuration codes defining the operation of a user-defined addressing mode supplied by said means for addressing, the method comprising the steps of: executing instructions in the first software module for implementing said algorithm, wherein the instructions are drawn from a fixed instruction set; and using said configuration codes to configure the operation of said user-defined addressing mode; wherein at least one instruction in said first module is used for referencing an operand, said referencing of said operand being accomplished at least in part with said user-defined addressing mode.
40. For use with a processor having a fixed architecture portion and a programmable address arithmetic unit (PAAU), a method of executing an algorithm by executing a sequence of opcodes assembled based upon an assembly language having a fixed instruction set and at least one mnemonic that refers to an operand in memory according to a user-defined addressing mode, the method comprising: writing a set of configuration codes into a storage area to define a first operation and a second operation of the user-defined addressing mode; executing a first instruction that invokes the user-defined addressing mode and calculating a first operand address using the first operation of the user-defined addressing mode; executing a user-defined addressing mode change instruction that causes the second user-defined addressing mode to be activated into a PAAU; and executing a second instruction that invokes the user-defined addressing mode and calculating a second operand address using the second operation of the user-defined addressing mode.
41. The method of claim 40, wherein the user-defined addressing mode change instruction further causes the first user-defined addressing mode to be deactivated.
42. A computerized system comprising: a processor comprising a fixed architecture portion and a programmable address arithmetic unit (PAAU); a first software module comprising a collection of opcodes assembled from a fixed assembly language that has a fixed set of instructions, a fixed set of fixed addressing modes, and at least one user-defined auto-update addressing mode, wherein the execution of the first software module results in the implement an application algorithm through a sequence of individual op-code executions defined by a program flow; and a configuration data module comprising at least one configuration code adapted to configure the operation of at least one user-defined auto-update addressing mode in the PAAU; wherein at least a subsequence of the sequence of individual op-code executions makes reference to an operand using the user-defined auto-update addressing mode, and the execution of successive instructions in the subsequence of instructions results in a addressing pattern to be generated, wherein the addressing pattern is nonlinear and is dependent on the application algorithm, and wherein the user-defined auto-update addressing mode is configured to generate the addressing pattern using less instruction cycles than would be possible by using a combination of instructions involving the fixed set of addressing modes.
COPYRIGHT
User acknowledges that Fairview Research and its third party providers retain all right, title and interest in and to this xml under applicable copyright laws. User acquires no ownership rights to this xml including but not limited to its format. User hereby accepts the terms and conditions of the License Agreement.
