МЦСТ
НОВОСТИ
О КОМПАНИИ
МИКРОПРОЦЕССОРЫ И ВЫЧИСЛИТЕЛЬНЫЕ КОМПЛЕКСЫ
ПРОГРАММНОЕ ОБЕСПЕЧЕНИЕ «ЭЛЬБРУС»
УРОВЕНЬ ПРОЕКТИРОВАНИЯ
ОБЕСПЕЧЕНИЕ КАЧЕСТВА
НАУЧНАЯ ДЕЯТЕЛЬНОСТЬ
 Секция архитектуры и схемотехники
 Секция программного обеспечения
 
   последних лет
   текущего года
 диссертации
 патенты
ОБРАЗОВАНИЕ
НАШИ ПАРТНЕРЫ
ВАКАНСИИ
КООРДИНАТЫ

 

 

Создание и сопровождение вебсайта: Алгософт

Rambler's Top100 Rambler's Top100



Патенты ЗАО "МЦСТ"

ЗАО "МЦСТ" имеет 35 патентов США и 9 патентов Российской Федерации. Ниже приводится предмет патентования и номер каждого из них. Патенты США вводятся с аннотациями, их полные тексты доступны по адресу: http://www.uspto.gov/patft/index.html

Архитектура и микроархитектура микропроцессоров

1 Wide instruction word architecture central processor
United States Patent No. 5,418,975
Abstract

A central processor for scientific-technica, economic-statistical computations, for solving the problems of modelling and control with the architecture of an extended instruction work comprises instruction data buffer memories 1 and 3, respectively, a control device 2, a data commutator 4, an arithmeticologic device 5, record-calling, indexing, associative memory, mathematical-to-physical address conversion, interface, subprogram device (6-11), as well as a control character device 13 and an operand readiness device 14, and provides high efficiency both on vector and scalar computations.

2 Computer caching method and apparatus
United States Patent No. 5,781,924
Abstract

When cache misses occur simultaneously on two or more ports of a multi-port cache, different replacement sets are selected for different ports. The replacements are performed simultaneously through different write ports. In some embodiments, every set has its own write ports. The tag memory of every set has its own write port. In addition, the tag memory of every set has several read ports, one read port for every port of the cache. For every cache entry, a tree data structure is provided to implement a tree replacement policy (for example, a tree LRU replacement policy). If only one cache miss occurred, the search for the replacement set is started from the root of the tree. If multiple cache misses occurred simultaneously, the search starts at a tree level that has at least as many nodes as the number of cache misses. For each cache miss, a separate node is selected at that tree level, and the search for the respective replacement set starts at the selected node.

3 Architectural support for execution control of prologue and epilogue periods of loops in a VLIW processor
United States Patent No. 5,794,029
Abstract

For certain classes of software pipelined loops, prologue and epilogue control is provided by loop control structures, rather than by predicated execution features of a VLIW architecture. For loops compatible with two simple constraints, code elements are not required for disabling garbage operations during prologue and epilogue loop periods. As a result, resources associated with implementation of the powerful architectural feature of predicated execution need not be squandered to service loop control. In particular, neither increased instruction width nor an increased number of instructions in the loop body is necessary to provide loop control in accordance with the present invention. Fewer service functions are required in the body of a loop. As a result, loop body code can be more efficiently scheduled by a compiler and, in some cases, fewer instructions will be required, resulting in improved loop performance. Loop control logic includes a loop control registers having an epilogue counter field, a shift register, a side-effects enabled flag, a current loop counter field, a loop mode flag, and side-effects manual control and loads manual control flags. Side-effects enabling logic and load enabling logic respectively issue a side-effects enabled predicate and a loads enabled predicate to respective subsets of execution units. Software pipelined simple and inner loops are supported.

4 Array prefetch apparatus and method
United States Patent No. 5,889,985
Abstract

An array prefetch system improves processor performance by automatically tuning a statically compiled and compacted loop program at run-time to accommodate variations in latency of memory read operations. Using the array prefetch system, the processor, while awaiting completion of a data access, continues to generate requests for subsequent iterations rather than fully halting execution until a read access is finished.

5 Multifunction execution unit having independently operable adder and multiplier
United States Patent No. 5,923,871
Abstract

Floating point performance in a VLIW processor is increased through concatenation of two floating point units, one an adder and another a multiplier, which execute independently of one another but which operate in cooperation for certain combinations of issued operations. In particular, the floating point adder and the floating point multiplier may be activated individually by very long instruction words, which are also called wide instructions, that issue operations to either unit or both units at one time. For other wide instructions, both the floating point adder and the floating point multiplier may be sequentially activated by a single instruction with three operands. The first two operands are used by one of the units, either the floating point adder or the floating point multiplier, and the unit is activated. The result generated by the activated unit and the third operand are applied as operands for usage by one of the units, again either the floating point adder or the floating point multiplier, and the unit is activated. In this manner, a single instruction activates the floating point multiplier and the floating point adder in combination for execution of two operations. Each of the floating point adder and the floating point multiplier execute once or either executes twice by the issue of a single instruction.

6 Architectural support for software pipelining of nested loop
United States Patent No. 5,958,048
Abstract

For certain classes of software pipelined loops, prologue and epilogue portions of adjacent inner loops in a nested loop can be overlapped. In this way, outer loop code, as well as inner loop code, can be software pipelined. Architectural support for software pipelined nested loops is provided by a set of loop parameter and status registers and by an implementation of loop state dependent, multiway control transfers. For loop body code compatible with two simple constraints, the present invention does not require additional code elements for disabling garbage operations during prologue and epilogue loop periods of adjacent inner loops. Nested loop control allows overlap between the epilogue period of a prior inner loop and the prologue period of a next inner loop. As a result, nested loop code can be more efficiently scheduled by a compiler for execution on a processor such as VLIW processor which provides architectural support for software pipelined nested loops, thereby providing improved loop performance. Loop state dependent, multiway control transfers are provided by multi-way control transfer logic which includes the loop parameter and status registers and a branch target selector for selecting control transfer addresses corresponding to inner loop body code, a start patch, and a finish patch from control transfer address registers in accordance with loop state.

7 Method and apparatus for packing and unpacking wide instruction word using pointers and masks to shift word syllables to designated execution units groups
United States Patent No. 5,983,336
Abstract

An unpacking circuit and operating method in a very long instruction word (VLIW) processor provides for parallel handling of a packed wide instruction in which a packed wide instruction is divided into groups of syllables. An unpacked instruction representation includes a plurality of syllables, which generally correspond to operations for execution by an execution unit. The syllables in the unpacked instruction representation are assigned to groups. The packed instruction word includes a sequence of syllables and a header. The header includes a descriptor for each group. The descriptor includes a mask and may include a displacement designator. The multiple groups are handled in parallel as the displacement designator identifies a starting syllable. The mask designates the syllables which are transferred from the packed instruction to the unpacked representation and identifies the position of NOPs in the unpacked representation.

8 Method and system for asynchronous array loading
United States Patent No. 6,243,822
Abstract

The present invention decreases the delay associated with loading an array from memory by employing an asynchronous array preload unit. The asynchronous array preload unit provides continuous preliminary loading of data arrays located in a memory subsystem into a prefetch buffer. Array loading is performed asynchronously with respect to execution of the main program.

9 Integrity of tagged data
United States Patent No. 6,549,903
Abstract

A method and computer apparatus are presented for providing a secure data architecture for computer memory of a processor. The apparatus comprises a memory unit and a processing unit. Data are stored in the memory unit and manipulated by the processing unit, which is programmed to implement the data architecture. Tagged single data words are formed by concatenating a tag to each of the single data words. Each of the tags takes a value that corresponds to the data type of the single data word to which it is concatenated. A data multiword is creating by concatenating tagged single data words having the same data type. The data multiword is stored within a location in the computer memory, the location selected to ensure alignment of the data multiword in accordance with its length. An effective tag value is constructed for the data multiword by concatenating all of its single word tags. An operation is prevented from being performed on the data multiword if the effective tag value is not one of a predetermined list of valid effective tag values for the operation.

10 Branch preparation
United States Patent No. 6,560,775
Abstract

A method and system for preparing branch instruction of a computer program, for compiling and execution in a computer system, in which each transfer instruction is split into two instructions: a control transfer preparation instruction and a control transfer instruction, wherein the control transfer preparation instruction contains the transfer address and is placed by the compiler several instructions ahead of the control transfer instruction, so that the number of clock cycles in the pipeline between transfer condition generation and transfer itself would be reduced.

11 Methods and apparatus for conflict free Execution of Integer and Floating-point Operations with a Common Register File
United States Patent No. 6,668,316
Abstract

In a wide instruction architecture processor device, an instruction execution unit provides integer and floating point capability within its constituent arithmetic logic channels. Results are written out to a register file where integer results are given higher priority over floating point results, which are buffered, in order to increase integer operation throughput. By buffering floating point results and giving priority to integer results, fewer register file write ports are needed. A bypass mechanism allows access to floating point results during their pendency in the buffer. Dual serially-configured integer units are configured to enable two-operand and combined (three-operand) instructions to be delivered to an arithmetic and logic channel at every clock cycle. Similarly, dual parallel pipelined floating point units are configured to permit two-operand and combined (three-operand) floating point instructions to be delivered to an arithmetic and logic channel on each clock cycle.

12 Method and Apparatus for Performing Pipelined SRT Division
United States Patent No. 6,751,645
Abstract

An SRT division unit for performing a novel SRT division algorithm is presented. The novel SRT division algorithm comprises a method for performing SRT division using a radix r. As one skilled in the art will appreciate, the radix r dictates the number of quotient-bits k generated during a single iteration. The relationship between radix r and the number of quotient-bits k generated in a single iteration is r=2.sup.k. The number of iterations needed to determine all quotient-digits is N, such that N=54/k for a 64 bit floating point value. In accordance with one embodiment of the present invention, the SRT division unit generates a scaling factor M, which comprises scaling sub-factors M1 and M2 according to the relationship M=r*M1+M2. Next, the division unit generates a scaled divisor Y by multiplying a divisor DR by scaling factor M, such that said scaled divisor Y=DR*M=r(DR*M1)+DR*M2. In addition, the division unit generates partial remainder values w[00] and w[0] by muliplying a dividend DD by scaling sub-factor M1 and scaling factor M, respectively. Partial remainder value w[00]=DD*M1, and partial remainder value w[0]=DD*M=r(DD*M1)+DD*M2. Scaled divisor Y and partial remainders w[0] and w[00] then are used to generate quotient-digits and additional partial remainders. Accordingly, the division unit performs iterations j which generate quotient-digits according to the formula q[j]=SEL(r.sup.2 *w.sub.msb [j-2], q[j-1]). Also, the iterations generate additional partial remainders w[j] according to the formula w[j]=rw[j-1]-q[j-1]*Y. N iterations are performed, generating all quotient-digits for the division operation.

13 Method for prioritizing operations within a pipelined microprocessor based upon required results
United States Patent No. 7,003,650
Abstract

A method and apparatus for solving the output dependence problem in an explicit parallelism architecture microprocessor with consideration for implementation of the precise exception. In case of an output dependence hazard, the issue into bypass of a result of the earlier issued operation having an output hazard is cancelled. Latencies of short instructions are aligned by including additional stages on the way of writing the results into the register file in shorter executive units, which allows to save the issue order while writing the results into the register file. For long and unpredictable latencies of the instructions, writing of the result of the earlier issued operation having an output dependence hazard into the register file is cancelled after checking for no precise exception condition. All additional stages are connected to the bypass not to increase the result access time in case of this result use in the following operations.

14 Способ и система для асинхронной загрузки массивов данных
Патент Российской Федерации № 2166791

15 Система управления конвейеризованным циклом процессора широкого командного слова
Патент Российской Федерации № 2184389

16 Способ фильтрации межпроцессорных запросов в многопроцессорных вычислительных системах и устройство для его осуществления
Патент Российской Федерации № 2189630

17 Устройство для суммирования мантисс и нормализации результата при вычислениях с плавающей запятой
Патент Российской Федерации № 2242045

18 "Устройство выбора минимального или максимального числового значения из двух n-разрядных чисел"
Патент Российской Федерации № 2262130

19 "Устройство для коррекции порядка результата сложения чисел с плавающей запятой"
Патент Российской Федерации № 2267806

Технологии компиляции

20 Compiler method and apparatus for elimination of redundant speculative calculation from innermost loops
United States Patent No. 6,301,706
Abstract

A method and system for use with VLIW processing architectures for avoiding redundant speculative computations in the compilation of the innermost loops. The method includes identifying a plurality of compiled flow paths, where each of the paths includes a plurality of conditions associated with the loop that permits transformation of the loop for more optimum execution. It is then determined whether the loop has an inductive variable and a conditional statement that depends on the inductive variable. It is also determined whether the loop set up values of the inductive variables to subsets, and at least one of which the conditional statement is a loop invariant. Finally, if conditions in the determination steps satisfy the conditions of one of the paths, the loop is transformed into two consecutive loops executable with a reduced set of values of the inductive variable.

21 Computer system and method for parallel computations using table approximation methods
United States Patent No. 6,363,405
Abstract

A method optimizes function evaluations performed by of a VLIW processor through enhanced parallelism by evaluating the function by table approximation using decomposition into a Taylor series.

22 Computer method and apparatus for compilation of multi-way decisions
United States Patent No. 6,412,105
Abstract

Computer method of compiling a multi-way decision statement for VLIW processing is described. The method comprises: (a) generating profile data for a multi-way decision statement, such a s a switch statement; identifying at least one most probable alternative of the multi-way decision and a set of constants associated with the identified alternative using the profile data; determining a probable subset of the identified constants based on the profile data; constructing a conditional statement for the identified alternative using the probable subset of constants; and moving out the identified at least one alternative from the multi-way decision statement.

23 Cache miss saving for speculative load operation
United States Patent No. 6,516,462
Abstract

Compiler optimization methods and systems for preventing delays associated with a speculative load operation on a data when the data is not in the data cache of a processor. A compiler optimizer analyzes various criteria to determine whether a cache miss savings transformation is useful. Depending on the results of the analysis, the load operation and/or the successor operations to the load operation are transferred into a predicated mode of operation to enhance overall system efficiency and execution speed.

24 Method for removing dependent store-load pair from critical path
United States Patent No. 6,516,463
Abstract

A method, implemented by a compiler, for removing a store-load dependency from a critical path utilizes a compare address operation to determine at run time whether dependency actual exists. The operand to be stored is held in a temporary register and provided directly to the operations, using load operation result, in dependence on the value of the compare address operation result, so that the dependency is removed.

25 Critical path optimization-optimizing branch operation insertion
United States Patent No. 6,526,573
Abstract

A compiler optimization method for optimizing a scheduled block of instructions inserts a conditional branch instruction in place of a merge instruction to select between alternative paths when a condition is resolved.

26 Critical path optimization - unzipping
United States Patent No. 6,564,372
Abstract

A method and apparatus for optimizing scheduling of a block of program instructions to remove a condition resolving instruction from the critical path where the resolution of a condition controls the selection between input results, generated by predecessor operations, by a merge operation which passes the selected result to a successor operation. In a preferred embodiment, the successor operation is "unzipped" by duplicating the successor operations, providing predecessor results directly to the, duplicated successor operations, and scheduling the duplicated successor operations prior to the merge.

27 Critical path optimization - unload hard Extended Scalar Block
United States Patent 6,584,611
Abstract

A method, implemented in a compiler, of balancing the workload between blocks in a control flow to reduce the overall execution time of control block includes steps for identifying "hard" blocks the consume excess resources, selecting hard block to unload, and unloading critical operations from a hard block to a control flow predecessor.

28 Computer system and method for parallel computations using table approximation
United States Patent No. 6,567,831
Abstract

A method optimizes function evaluations performed by of a VLIW processor through enhanced parallelism by evaluating the function by table approximation using decomposition into a Taylor series.

29 Profile Driven Code Motion and Scheduling
United States Patent No. 6,594,824
Abstract

A method and apparatus for generating an optimized intermediate representation of source code for a computer program are described. An initial intermediate representation is extracted from the source code by organizing it as a plurality of basic blocks that each contain at least one program instruction ordered according to respective estimated profit values. A goal function that measures the degree of optimization of the program is calculated in accordance with its intermediate representation. The effect on the goal function of modifying the intermediate representation by moving an instruction from one of the basic blocks to each of its predecessors is tested iteratively and adopting the modified intermediate representation if it causes a reduction in the goal function.

30 Register economy heuristic for a cycle-driven multiple issue instruction scheduler
United States Patent No. 6,718,541
Abstract

A method for scheduling operations utilized by an optimizing compiler to reduce register pressure on a target hardware platform assigns register economy priority (REP) values to each operation in a basic block. For each time slot, operations are scheduled in order of their lowest REP values.

31 Method for emulation hardware features of a foreign architecture in a host operating system environment
United States Patent No. 6,732,220
Abstract

The present invention relates to a computer system adapted to efficiently execute binary translated code. In accordance with the present invention, foreign code is stored in a foreign virtual memory space, translated to acquire binary translated code, which is stored in a host virtual memory space and then executed. The host computer system isolates each virtual memory configuration into separate processes referred to as a virtual machine while enabling multiple virtual machines to exist simultaneously. Execution may switch from one virtual machine to another merely by switching to a new page table, where each page table describes the memory configuration of a virtual machine. Common system level resources are shared by the virtual machines under the control of a virtual memory manager.

32 Hardware supported software pipelined loop prologue optimization
United States Patent No. 6,954,927
Abstract

A method for optimizing a software pipelineable loop in a software code is provided. The loop comprises one or more pipelined stages and one or more loop operations. The method comprises evaluating an initiation interval time (IN) for a pipelined stage of the loop. A loop operation time latency (Tld) and a number of loop operations (Np) from the pipelined stages to peel based on IN and Tld is then determined. The loop operation is peeled Np times and copied before the loop in the software code. A vector of registers is allocated and the results of the peeled loop operations and a result of an original loop operation is assigned to the vector of registers. Memory addresses for the results of the peeled loop operations and original loop operation are also assigned.

33 Method of fast execution of binary translated code through data base low-level code correspondence checking
United States Patent No. 6,820,255
Abstract

The present invention increases efficiency of a binary translation process by correlating selected foreign code to previously translated binary host code. This approach eliminates repetitive translation of foreign code when the foreign code is executed on a host computer system. During the translation process, a database of translated foreign code is populated and thereafter a software layer checks for correspondence between the foreign code and binary code stored in the database. If the database contains corresponding code, that code is transferred to system memory for execution and there is no need to retranslate the foreign code. Minimizing the time spent translating the foreign code results in improved execution speed on the host computer system. The software layer creates an index into the database by hashing the foreign code or by using the storage location of the foreign code. By way of example, the sector of a disk drive where the foreign code is stored determines the index into the database.

34 Способ получения объектного кода
Патент Российской Федерации № 2206119

Высокопроизводительная схемотехника

35 Level transfer circuit for LVCMOS applications
United States Patent No. 6,265,896
Abstract

A fully static level translation circuit having a standby power close to zero. The level translation circuit for translating the voltage level of an input signal having a first voltage level to form an output signal having a second voltage level. The translation circuit comprises an input stage having logic to receive the input signal having the first voltage level and to create a first stage output signal, an output stage having logic to receive the first stage output signal and produce the output signal having the second voltage level, and a reset stage having logic to receive the first stage output signal and the output signal and to produce a reset stage output signal that is coupled to the output stage.

Use: No

36 Method and apparatus for ajusting the static thresholds of the CMOS circuits
United States Patent No. 6,313,691
Abstract

An apparatus for adjusting static thresholds of CMOS circuits. The apparatus includes a low reference circuit including at least one channel n-channel MOS device having a back gate and a high reference circuit including at least one p-channel MOS device having a back gate. A feedback loop is provided for providing a control voltage to the back gate of the n-channel NMOS device while a second feedback loop is provided for providing a second control voltage to the back gate of the p-channel MOS device. A control voltage is applied to the first feedback loop while a control voltage is applied to the second feedback loop. The output of the low reference circuit is coupled to the first feedback loop and the output of the high reference circuit is coupled to the second feedback loop.

37 Efficient Half-Cycle Clocking Schemes for Self-Reset Circuits
United States Patent No. 6,323,688
Abstract

A pipelined domino architecture includes pairs of pipeline stages each comprising a first active clocked stage and a number of subsequent self-reset logic gates. Each pipeline stage is clocked by one or the other of a clock signal. Each active clocked stage and self-reset logic gate of any particular pipeline state includes a reset circuit to reset the output of such stage or gate at the conclusion of an evaluation period that is initiated by a phase of the clock signal. Only the active clocked stage is clocked; the self-reset logic stages rely upon the reset of the output of the active clocked stage to generate the necessary reset signals that will reset their respective outputs.

38 System for improving LVMOS performance
United States Patent No. 6,320,446
Abstract

A system for increasing the speed and noise immunity of signals transmitted in low voltage CMOS applications. The system includes a transmission device for transmitting a signal in a CMOS circuit, wherein the CMOS circuit includes a high voltage power supply and a low voltage power supply and the signal is transmitted between first and second portions of the CMOS circuit that are coupled to the low voltage power supply. The transmission device comprises a transistor having a gate, drain and source terminals, wherein the drain terminal is coupled to the first portion of the CMOS circuit to receive the signal, and the source terminal is coupled to the second portion of the CMOS circuit and a gate controller coupled to the high voltage power supply and providing a gate control signal coupled to the gate terminal, wherein the gate controller may provide a level approximately equal to the high voltage power supply to the gate terminal via the gate control signal, so that the transistor connects the drain and source terminals.

39 High-speed sense amplifier capable of cascode connection
United States Patent No. 6,351,155
Abstract

A clocked CMOS sense amplifier for high speed latching of low voltage complementary signals. The present invention includes a sense amplifier having a controlled cross-coupled transistor structure, a control circuit, a current source, a recovery transistor and protective transistors. A CORE circuit is provided which may be used to form different logic structures. Two large n-channel transistors in a discharging chain are used in combination with the small capacitances of the cross-coupled nodes to provide maximum speed and high output.

40 High-speed low-power data transfer scheme
United States Patent No. 6,366,130
Abstract

A data transfer arrangement. The data transfer arrangement includes two active pull up/active pull down bus drivers and a voltage precharge source. A differential bus is coupled to the bus drivers and to the voltage precharge source. A latching sense amplifier is coupled to the differential bus and serves as the bus receiver. The bus drivers operate in a precharge phase and a data transfer phase. The bus receiver operates in an analogous but opposite manner, i.e., when the bus drivers are in the precharge phase, the bus receiver is in the data transfer phase and when the bus drivers are in the data transfer phase, the bus receiver is in a precharge phase.

41 Power supply control circuit for LVCMOS
United States Patent No. 6,373,149
Abstract

A power system for controlling power to low voltage CMOS circuits. The power system can be used in circuits having a low voltage supply and a high voltage supply, wherein the low voltage supply powers low voltage circuit components and the high voltage supply powers high voltage circuit components. The power system comprises a first switch coupled between the low voltage supply and the low voltage circuit components, a second switch coupled between the low voltage circuit components and a circuit ground, and a power control circuit coupled to the high voltage supply and the circuit ground and having a control output coupled to the first and second switches, wherein when the control output is in a first state the low voltage supply and the circuit ground are connected to the low voltage circuit components and when the control output is in a second state the low voltage supply and the circuit ground are disconnected from the low voltage circuit components.

42 High-speed low-power sense amplifying half-latch and apparatus thereof for small-swing differencial logic (SSDL)
United States Patent 6,424,181
Abstract

A high-speed sense amplifier includes a pair of cross-coupled inverters coupled to intermediate nodes and then to differential inputs nodes by a control circuit. The intermediate nodes are coupled together by a accelerator transistor that forms a current path when the sense amplifier is placed in a sensing state to provide parallel discharge paths for one or the other of output nodes. During precharge, the accelerator transistor operates to equalize the intermediate nodes to ready them for the next sense phase.

43 Базовый усилительный элемент дифференциальной динамической логики
Патент России № 2154338

44 Логическое конвейерное устройство
Патент Российской Федерации № 2175811

 
Tel: +7 (495) 363-9665 | Fax: +7 (495) 363-9599 | E-mail: mcst@mcst.ru