# Exploring Different Approximate Adder Architecture Implementations in a 250 °C SOI Technology

R. Nowosielski, J. Hartig, G. Payá-Vayá, and H. Blume Institute of Microelectronic Systems, Leibniz Universität Hannover Appelstr. 4, 30167 Hannover, Germany Email: {nowosielski, hartig, guipava, blume}@ims.uni-hannover.de

A. García-Ortiz Institute of Electrodynamics and Microelectronics University of Bremen, Bremen, Germany agarcia@item.uni-bremen.de

Abstract-Designing VLSI circuits for high temperature applications requires the use of specialized ASIC technologies suited for operation above 250°C. The technologies available today expose computation performance as well as the integration density far beyond state-of-the-art VLSI technologies. Especially at high temperatures, the switching speed of integrated circuits becomes very slow. Thus, the design space for implementation of digital architectures for signal processing at high temperature is restricted. This paper presents, new in context of high temperature ASIC design, and evaluates different adder architectures used in approximate computing and stochastic arithmetic. The goal is to analyze the error characteristics of these adders under out-of-specification operation like high temperature and frequency overscaling. Using these results, the best adder architecture for out-of-specification operation for a given application can be identified. The presented work describes the first step of a two-fold evaluation process, recording results from gate level simulation as well as introducing the chip design for real world experiments approved for fabrication.

# Keywords-High Temperature; SOI; Approximate; ASIC

#### I. INTRODUCTION

Stochastic computing [1], [2] has recently emerged as a promising approach for designing energy-efficient embedded hardware systems, taking the ability of many applications to tolerate a loss of quality or full precision in the computed results into account. Rather than hiding hardware implementation constraints under expensive guard-bands, designers can relax the traditional correctness constraints and deliberately expose hardware variability, obtaining significant processing performance improvements and energy benefits. Typical hardware constraints are those related to the maximum operation frequency (i.e., timing constraints) and power supply (i.e., operation voltage).

Designing imprecise hardware systems to minimize the impact of the functional degradation is one of the main goals of stochastic computing. Understanding all these design trade-offs concerning the resulting imprecise computation is mandatory and domain specific. Therefore, stochastic computation advocates an explicit characterization and exploitation of error statistics.

Approaches for stochastic arithmetic as well as approximate computing in VLSI circuits for high temperature have not been published yet. Thus, no characterization of performance or principle functionality for specialized arithmetic architectures exists. In this paper, the stochastic computing approach is studied on a 250 °C high temperature SOI technology [3], where the path delay of the implemented circuits increases significantly with rising temperatures. For that, a complete chip design for a high temperature technology is evaluated at gate level first. This chip design includes different types of 16-bit adder implementations, which are analyzed in terms of error rate for two different temperature corners and frequency overscaling by performing gatelevel simulations after layout implementation. Moreover, two different types of approximate adders [4], which were originally conceived as deterministic designs that produce imprecise results, are also included. The goal of this work is to analyze the error characteristics of these adders over a range of temperature and by means of frequency overscaling. For a precise analysis, the presented chip design is approved for fabrication. Therefore, the results presented in this paper will be enhanced with results from comprehensive experiments after fabrication.

This paper is organized as follows: Section II explains all adder architectures that have been selected for evaluation and ASIC implementation. Section III gives some background information on high temperature VLSI technologies and describes the overall design concept. Section IV evaluates the results of the designed stochastic ASIC at gate level. Finally, this paper is concluded in Section V.

# II. APPROXIMATE ADDER BACKGROUNDS

#### A. Precise Adders (PA)

In the following, three well-known precise adder (PA) architectures [5] are described, which were used as a reference to the approximate adder implementations concerning delay, silicon area and error characteristics due to frequency and temperature overscaling. All adders throughout this paper are considered to have a length of 16-bit, which is an appropriate trade-off between sufficient precision and silicon area within the targeted application of high temperature electronics.

From all available adder architectures, the **ripple-carryadder** (**RCA**) is the most straight-forward one. It is built up



Figure 1. (a) 16-bit ripple-carry-adder (RCA), (b) 16-bit almost correct adder (ACA) with maximal carry propagation length of 4 [6].



Figure 2. (a) 16-bit carry-select adder (CSA), (b) 16-bit carry-lookahead-adder (CLA) [5].

by connecting several full adder cells (1-bit basic adders for sum and carry) to a long chain (see Fig. 1a), which makes its critical path delay a linear function of its bit length.

In order to reduce the critical path, a **carry-select-adder** (**CSA**) architecture can be chosen as depicted in Fig. 2a. This approach divides the RCA into smaller sub-adders and calculates the sum for both possible carry-in signals. A multiplexer tree then selects the correct sub-sums resulting in a shorter critical path than the RCA but requiring more silicon area.

A third alternative standard adder is the **carry-lookahead-adder** (**CLA**). By calculating carry-generate and propagate signals and a tree-structure combining them, the critical path becomes a logarithmic function of the bit length (see Fig. 2b).

# B. Almost-Correct Adders (ACA)

In contrast to the precise adders described above, approximate adder architectures trade speed against accuracy to overcome the theoretical limitations of these precise adder designs, producing acceptable results for many applications. One approach of this kind is the concept of **almost-correct adders (ACA)** presented by the authors of [6]. By limiting the maximal carry propagation length, these adders still provide correct results for most input operand combinations, whereas their inaccuracy is limited to a small remaining probability.

The longest propagation of the carry within an addition, can be modeled by a coin toss experiment as described in [7] using a recursive function. For 16-bit additions, the total number of input operand combinations where the carry propagation length CP is less than or equal to different threshold values  $CP_{th}$  is presented in Table I. From this,

Table I CARRY-PROPAGATION THRESHOLD AND ACA ACCURACY FOR 16-BIT ADDITIONS.

| <b>Carry-propagation</b> | Number of input                     | ACA      |
|--------------------------|-------------------------------------|----------|
| threshold $CP_{th}$      | combinations with $CP \leq CP_{th}$ | accuracy |
| 4                        | 52656                               | 80.35%   |
| 6                        | 62725                               | 95.71%   |
| 8                        | 64960                               | 99.12%   |
| 10                       | 65424                               | 99.83%   |
| 12                       | 65516                               | 99.97%   |

an average ACA accuracy can be calculated. By using adders with limited carry-propagation lengths, the critical path delay can be decreased at rather small drop of average accuracy. Due to this relationship, the ACA errors belong to the class of **infrequent large magnitude (ILM) errors** [8].

A possible implementation of an ACA with a maximum carry propagation length of 4 bit is shown in Fig. 1b. For 4-bit carry propagation, 6-bit PAs, e.g., using RCAs, are required. This adder architecture is referred to as *ACA-RCA4* in the following. To mitigate the high degree of redundant computations within this implementation, a tree-like area saving method described in [6] was used. One further modification of this adder examined in this paper can be achieved by replacing the least significant RCA by bitwise OR composition between the two operands described as *ACA-OR4*.

Error detection and correction mechanisms to compensate the ACA drop of accuracy are described in literature (e.g., [6]), but those are out of the scope of this paper, since the focus lies in the investigation of pure adder circuits and the inferred computation errors due to frequency and temperature overscaling.

# C. Error-Tolerant Adders (ETA)

Another type of approximate adder is the so-called **error**tolerant adder (ETA) presented by the authors of [9]. The ETA divides the input operands into two parts using a parameter m. Those parts are the *accurate* part for the MSBs of the operands and the *inaccurate* part for the LSBs (see Fig. 3).

The accurate part can be implemented using either an RCA or other PA architecture. This tries to limit the error magnitude to an upper bound. The inaccurate part is implemented as a carry-free adder using just XOR-gates to compute sum bits. Whenever there is no carry, the sumbit is accurate. In addition, a control block generates a control signal whenever both input bits  $a_i$  and  $b_i$  are 1 (carry generate). This control signal is then back-propagated from left to right setting all sum-bits to 1, which rounds the result up. Due to that, the ETA usually produces wrong results, but only with a small error magnitude so its error belongs to the class of **frequent small magnitude (FSM)** 



Figure 3. 16-bit error-tolerant-adder (ETA-FIX-m) [9].

**errors** [8]. The delay of the ETA architecture is determined by the inaccurate part. In the following, this adder is called *ETA-FIX-m*, depending on the size *m* (in bits) of the accurate part.

The authors of [10] present a modification of this adder, which is called *ETA-FLEX* in the following. This approach shifts the accuracy border m bits between the accurate and the inaccurate part dynamically depending on the value of the input operands. This results in a higher accuracy for smaller input values, but also has an even longer critical path than an RCA.

# III. STOCHASTIC ASIC

# A. 250 °C SOI Technology Backgrounds

Integrated electronics for digital signal processing require ASIC technologies, that are optimized for low leakage currents at high temperatures. Available VLSI technologies for temperatures around 250 °C are based on Silicon-On-Insulator technology (SOI) [11]: Integrated circuits, built in standard bulk-CMOS technologies, start to malfunction at higher temperatures due to increased leakage current from reverse biased drain and source diodes towards the substrate as indicated in Fig. 4a. The insulation layer of buried oxide in SOI technologies as shown in Fig. 4b prevents leakage currents significantly. This allows operation of SOI transistors beyond 300 °C as evaluated for instance in [12].

However, the drawback of these high temperature technologies is the low switching speed of digital circuits as well as low density of gates per square unit. For comparison, Fig. 5 gives a comparison of available high temperature technologies to a moderate bulk-CMOS 130 nm technology. As can be seen from synthesis results, high temperature circuits are bigger and also slower by nearly a factor of 100, respectively. Nevertheless, there exists the aspiration for high temperature electronic technology to be shrunk to more convenient feature sizes [13], [14].

## B. ASIC Architecture

In order to examine different adder architectures, as introduced in Section II, this paper proposes the VLSI design, that is presented on Register-Transfer-Level (RTL) in Fig. 6 and has been taped out for manufacturing according to the layout shown in Fig. 7. The core of the design is an array of 16 different 16-bit adders. Due to pin count limitations,



Figure 4. Principle drawing of technological alternatives for implementation of an n-MOS transistor.



Figure 5. Synthesis results of selected arithmetic blocks for a performance characteristic of high temperature technologies. FP - Floating Point, SP - Single Precision, MUL16 - 16-bit multiplier, ADD16 - 16-bit adder

the output of each adder is multiplexed to 17 output pins of the ASIC. For stimulation of the adder core, two different approaches have been implemented:

- *External Mode*. In this mode, seven 8-bit registers form a shift register. External 8-bit data are captured in the first register, while the data from the previous cycle are passed to the subsequent register. The registers in Fig. 6 with a capital letter D are the effective data register. They are lead on the adder inputs as indicated by the virtual registers for the operands A and B. The transition registers marked with a capital letter T are required for setting up a defined transition in the next clock cycle for each operand bit. This way, bit errors depending on previous operand state can be observed and distinct transitions can be controlled. In that mode the ASIC can process one evaluation step every eight clock cycles (seven cycles setup, one addition) if transition conditions are to be considered.
- *LFSR Mode.* In the linear feedback shift register mode, the input registers marked with the capital letter *L* in Fig. 6, each 8-bit wide, are interconnected. This interconnection implements the required feedback according to a Galois LFSR architecture. The data input is disabled. With each clock cycle the resulting 32-bit LFSR steps one bit position forward while creating a

new pseudo-random value for both adder operands. In this mode, the adders can evaluate one pseudo-random vector per clock cycle.

In order to measure performance degradation of each adder, exposure to high temperature is considered in the test setup after fabrication as well as reduction of supply voltage and increased system clock frequency of the ASIC. Because all of these effects have an impact, especially on the signal propagation time inside the VLSI circuit, additional care was taken during design to implement the input capture circuitry and the output multiplexer stage with a wide safety margin regarding slack time in the regions of the ASIC, that are not part of the adder core.

### IV. EVALUATION

As mentioned before, the presented chip design is not fabricated yet. Thus, the chip design has been evaluated by performing a gate level simulation. The netlist originates from the backend flow in Cadence Encounter and contains gate delays as well as wiring delays in SDF format. By using



Figure 6. RTL diagram of implemented high temperature stochastic ASIC.



Figure 7. Plot of ASIC layout submitted for manufacturing. The die size is  $3.37\,\mathrm{mm}\times3.35\,\mathrm{mm}.$ 

Modelsim from MentorGraphics, each adder was stimulated with 50,000 random samples and additional samples at word length boundary. The results were compared to the correct sum as reference by an automatic evaluation script outside Modelsim.

In Fig. 8, error occurrences are plotted for each adder. The K-factor corresponds to the frequency overscaling. This is normalized to the propagation delay  $T_{critical}$  of the critical path for the complete chip design:

$$K = \frac{T_{critical}}{T_{overscale}} = \frac{f_{overscale}}{f_{max}}$$

1

For quantization of the absolute computation error of sum, the following measure is introduced:

$$\epsilon = \operatorname{ld} ||sum - reference||$$

Thus,  $\epsilon$  indicates, how big the inaccuracy of computation is in relation to a given over-scale factor. In order to get the characteristic error distribution for each adder design, an  $\epsilon$ histogram is computed for each *K*-factor and temperature as shown in Fig. 8.

The following results can be extracted from Fig. 8 and 10:

- The first error occurrences in the RCA design correspond to high magnitude errors as expected. In contrast, the CLA design breaks at later *K*-factor with a broad range in error magnitude. The benefit of increased maximum operation frequency and error distribution is traded-off with significantly increased cell area requirement.
- The ACA adders present small numbers of error occurrences but with a higher magnitude. The error occurrences for a *K*-factor smaller than 2 are 0.022%, 0.078%, 0.474%, 2.132% and 10.028% for ACA\_RCA\_12, ACA\_RCA\_10, ACA\_RCA\_8, ACA\_RCA\_6, and ACA\_RCA\_4, respectively. These kinds of approximate adders present an even higher area requirement than the CLA adder. A variation with a logic OR-network replacing the RCA is also evaluated as suggestion for silicon area reduction. This reduction in area is traded-off with an increased occurrence of small magnitude errors.
- The ETA adders present a high number of small magnitude errors. This kind of approximate adders are even smaller than an RCA design. Only the ETA\_FLEX adder is bigger due to its previously discussed flexible adaption to smaller input values.

The effect of decreasing the temperature to  $175 \,^{\circ}$ C is presented in Fig. 9. As it can be seen from the exemplary selected adders, the error distribution remains nearly the same but shifts to a higher *K*-factor. This effect can be explained by observing, that the path delays decrease almost linearly with the temperature. For that, in Fig. 11a and 11b the shape distribution of each path delay per adder remains



Figure 8. Characteristic of error occurrences vs. magnitude of absolute error and frequency overscaling for each implemented adder at 250 °C.



Figure 9. Characteristic of error occurrences vs. magnitude of absolute error and frequency overscaling for each implemented adder at 175 °C.

nearly unchanged, but the time scales significantly by a factor of 1.7 towards the higher temperature.

# V. CONCLUSIONS

Different adder architectures for stochastic computing have been evaluated for their characteristics in high temperature VLSI implementation. The results were obtained from gate level simulation of a chip design. It has been shown, that depending on the targeted application, especially the ACA adder is well suited for applications, where infrequent high magnitude errors are acceptable. Contrary, the ETA presents a smaller area requirement while offering frequent but small magnitude errors. The path delay increases significantly at high temperature, for the presented change from 175 °C to 250 °C this delay rises by a factor of 1.7. The presented results can be used to develop high temperature applications,

which can be operated beyond the specified maximum temperature range with reduced but not broken adder arithmetic. The selection of the best suited adder architecture depends on the application.

Further verification of the presented simulation results will be carried out by experiments with this ASIC after fabrication. Moreover, the results obtained by this experiments can be further used for analyzing complex systems, which include adders in the critical path, like CORDIC processors, for high temperature applications.



Figure 10. Cell area for each evaluated adder after synthesis.



(a) Path delays for the evaluated adders after place & route for  $175\ ^{\circ}\mathrm{C}$  corner.



(b) Path delays for the evaluated adders after place & route for 250  $^{\circ}\mathrm{C}$  corner.

Figure 11. Path delays at 175 °C and 250 °C.

#### REFERENCES

- C. Kirsch and H. Payer, "Incorrect systems: it's not the problem, it's the solution," in *Proc. of the 49th Annual Design Automation Conference*. ACM, 2012, pp. 913–917.
- [2] J. Sartori, J. Sloan, and R. Kumar, "Stochastic computing: Embracing errors in architecture and design of processors and applications," in *Compilers, Architectures and Synthesis for Embedded Systems (CASES), 2011 Proc. of the 14th Int. Conf. on.* IEEE, 2011, pp. 135–144.
- [3] Fraunhofer Institute. (2014, Nov.) High Temperature SOI Technology H10. [Online]. Available: http://www.ims.fraunhofer.de
- [4] J. Han and M. Orshansky, "Approximate computing: An emerging paradigm for energy-efficient design," in *Test Symposium (ETS)*, 2013 18th IEEE European. IEEE, 2013, pp. 1–6.
- [5] B. Parhami, *Computer arithmetic: algorithms and hardware designs*. Oxford University Press, Inc., 2009.
- [6] A. K. Verma, P. Brisk, and P. Ienne, "Variable latency speculative addition: A new paradigm for arithmetic circuit design," in *Proceedings of the conference on Design, automation and test in Europe.* ACM, 2008, pp. 1250–1255.
- [7] M. F. Schilling, "The longest run of heads," *College Math. J*, vol. 21, no. 3, pp. 196–207, 1990.
- [8] J. Huang and J. Lach, "Exploring the fidelity-efficiency design space using imprecise arithmetic," in *Proceedings of the 16th Asia and South Pacific Design Automation Conference*. IEEE Press, 2011, pp. 579–584.
- [9] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, "Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing," *Very Large Scale Integration (VLSI) Systems, IEEE Transactions* on, vol. 18, no. 8, pp. 1225–1229, 2010.
- [10] N. Zhu, W. L. Goh, and K. S. Yeo, "Ultra low-power highspeed flexible probabilistic adder for error-tolerant applications," in *SoC Design Conference (ISOCC)*, 2011 International. IEEE, 2011, pp. 393–396.
- [11] J.-P. Colinge and J. Colinge, Silicon-on-Insulator technology: materials to VLSI. Springer, 2004, vol. 3.
- [12] K. Grella, S. Dreiner, A. Schmidt, W. Heiermann, H. Kappert, H. Vogt, and U. Paschen, "High temperature characterization up to 450 °C of MOSFETs and basic circuits realized in a silicon-on-insulator (SOI) CMOS technology," *Journal of microelectronics and electronic packaging*, vol. 10, no. 2, pp. 67–72, 2013.
- [13] L. Vancaillie, V. Kilchytska, P. Delatte, H. Matsuhashi, F. Ichikawa, D. Flandre *et al.*, "0.15 μm fully depleted SOI for mixed-signal applications up to 250 °C: Are we approaching the limits of device scaling for high-temperature electronics?" in 2003 International Conference on High Temperature Electronics (HITEN 2003), 2003.
- [14] Fraunhofer Institute. (2014, Nov.) High Temperature SOI Technology H035. [Online]. Available: http://www.ims.fraunhofer.de