# Two-Dimensional Schemes for Clocking/Timing of QCA Circuits

Vamsi Vankamamidi, Student Member, IEEE, Marco Ottavi, Member, IEEE, and Fabrizio Lombardi, Senior Member, IEEE

Abstract—At nanoscale, quantum-dot cellular automata (QCA) defines a new device architecture that permits the innovative design of digital systems. Features of these systems are the allowed crossing of signal lines with different orientation in polarization on a Cartesian plane, the potential of high throughput due to efficient pipelining, fast signal switching, and propagation. However, QCA designs of even modest complexity suffer from the negative impact due to the placement of long lines of cells among clocking zones, thus resulting in increased delay, slow timing, and sensitivity to thermal fluctuations. In this paper, different schemes for clocking and timing of the QCA systems are proposed; these schemes utilize 2-D techniques that permit a reduction in the longest line length in each clocking zone. The proposed clocking schemes utilize logic-propagation techniques that have been developed for systolic arrays. Placement of QCA cells is modified to ensure correct signal generation and timing. The significant reduction in the longest line length permits a fast timing and efficient pipelining to occur while guaranteeing a kink-free behavior in switching.

Index Terms—Clocking, emerging technology, nanotechnology, quantum-dot cellular-automata (QCA) architecture, timing.

#### I. INTRODUCTION

N THE PAST few decades, the exponential scaling in feature size and the increase in processing power have been successfully achieved by very large scale integration (VLSI) technology, mostly using CMOS; however, in the not-so-distant future [1], this technology will face serious challenges as the fundamental physical limits of its devices are reached. In recent years, there has been extensive research at nanoscale to supersede the conventional CMOS using the so-called emerging technologies. It is anticipated that these fundamentally different technologies can achieve extremely high densities and high operational speed. Among these new devices, quantum-dot cellular automata (QCA) not only gives a solution at nanoscale but also offers a new method of computation and information transformation (often referred to as processing-in-wire). In terms of feature size, it is projected that a QCA cell of a few-nanometer size can be fabricated through a molecular implementation by a self-assembly process [2], [3]. Sequential as well as combinational designs can be realized using the QCA. Designs based on QCA (such as carry-look-ahead adder, barrel shifter, microprocessors, and field-programmable gate arrays, have been presented in the technical literature [4]–[9].

The authors are with the Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115 USA (e-mail: vvankama@ece.neu.edu; mottavi@ece.neu.edu; lombardi@ece.neu.edu).

Digital Object Identifier 10.1109/TCAD.2007.907020

Solutions to the problem of driving the inputs to a QCA system and measuring the output are discussed in [10] and [11].

Physical design and placement of QCA cells present new challenges. For example, the ability of crossing OCA wires on a plane provides an additional advantage over the conventional CMOS technology. As signals in the OCA are not propagated using current and voltage, but rather through Coulombic interaction of electrons, then they can cross each other with no interference (it has been verified that a line of rotated cells does not interfere with a line of straight cells [5], [12]). However, long lines of QCA also result in an increased delay for signal propagation and switching. This can significantly reduce the operating speed of circuits manufactured using this technology. Currently, the signal propagation in QCA systems is mostly accomplished along serial timing zones as a 1-D technique; this technique has been proposed in [12]. The 1-D arrangement results from the four phases (adiabatic switching) required for timing the QCA cells. A trapezoid arrangement that exploits the four-phase timing arrangement of [12] has been proposed in [13] to accomplish a higher cell density through feedback paths. Long vertical lines consisting of many QCA cells are commonly required to route signals [13], thus imposing stringent timing constraints on the pipelining process. Moreover, correct switching among cells (i.e., kink-free operation) in a timing zone is affected by thermal fluctuations [14]. The operating temperature of the QCA, as well as the required clocking circuitry, is dependent on the length of the longest wire and the size of the timing zone, i.e., the QCA operation at room temperature much likely requires small lines.

In this paper, we consider issues pertaining to the timing and clocking of QCA systems for high-performance computing. Initially, the effects of thermal fluctuations on the QCA designs are studied to establish clocking-zone dimensions as a function of the longest QCA wire. Unfortunately, we show that high performance and low temperature require a different mechanism than the 1-D criteria of clocking proposed in [12] due to significant delays. To address this problem, two novel schemes are proposed for timing and clocking. These schemes are based on a 2-D characterization of information transfer across different timing zones arranged into grids. Issues, such as the clocking circuitry (as interfaced to CMOS) and the operating temperature, are also addressed. Novel logic-propagation techniques are also introduced for designs under the proposed clocking schemes. Computational time and pipelining are extensively analyzed as some of the performance metrics. The proposed clocking schemes utilize the equivalence between systolic processing and QCA zone switching, thus permitting

Manuscript received May 24, 2006; revised February 6, 2007. This paper was recommended by Associate Editor R. Suaya.



Fig. 1. Spatial configurations as binary behavior of the QCA cell.

sequential or parallel timing processing of signals across both dimensions of the QCA circuit in a Cartesian plane. Simulation results (using QCADesigner [15]) are provided for the combinational and sequential QCA circuits.

This paper is organized as follows. Section II introduces a brief review of QCA with a particular emphasis on timing and clocking. Section III discusses in detail the analysis for clocking the QCA systems. The first proposed scheme (based on a 2-D partitioning of the design into a grid of zones) is given in Section IV. The second scheme (based on a 2-D wave propagation of signals within a grid of zones) is given in Section V. Section VI addresses the issue of the feedback paths (as applicable to sequential circuits) in the 2-D grid; in addition, in this case, it is shown that zone partitioning is applicable. Section VII presents detailed simulation results. A conclusion is addressed in the last section.

#### II. REVIEW

QCA is a new device architecture that is amenable to nanometer scale (metal dots as well as molecular implementations) [2], [16]. The QCA stores logic states not as voltage levels but rather based on the position of individual electrons [5]. A quantum cell can be viewed as a set of four charge containers or dots that are positioned at the corners of a square cell. Computation is realized by the Coulombic interaction of extra electrons in the quantum dots. Each quantum dot is a nanometer-scaled square with wells at each corner of the cell. The two extra electrons that are present in each cell can quantum-mechanically tunnel between wells, but they cannot tunnel out of the cell. Electron repulsion causes the extra electrons to occupy diagonally opposite wells. These two electron configurations can be used to encode binary information in the cells. Fig. 1 shows the QCA cell and the Boolean nature of the polarization for its two electron configurations.

The unique feature of the QCA-based designs is that logic states are not stored in voltage levels as in the conventional electronics, but they are represented by the position of individual electrons. Unlike the conventional logic in which information is transferred from one place to another by electrical means, the QCA operates by the Coulomb interaction that connects the state of one cell to the state of its neighbors [17]. As no significant current flows (logic operations are due to the polarization of the spatial configurations of the cells), the power dissipation in QCA circuits is low compared with the conventional FET-based circuits [5]. This results in a technology in which information transfer (interconnection) is the same



Fig. 2. QCA clock phases.

as information transformation (logic manipulation), i.e., the processing-in-wire is said to have occurred.

The QCA cells can be arranged to realize different devices such as the binary wire, an inverter, and a majority voter (MV). The basic logic gate in QCA is the MV. The MV with a logic function MV(A, B, C) = AB + AC + BC can be realized by only five QCA cells (compared to a CMOS implementation that requires 16 transistors). Logic AND and OR functions can be implemented from an MV by setting one input (the programming input) permanently to zero and one, respectively. Cells that are positioned adjacent to each other tend to align and produce a QCA binary wire. The higher is the number of cells in the wire, the longer is (spatially) the wire and the higher is the time delay for signal propagation [12]. Cells that are positioned diagonally from each other align in an opposite fashion and produce logic complementation (i.e., an inverter).

In traditional electronic systems, timing is controlled through a reference signal (i.e., the clock); however, timing in QCA is accomplished by clocking in four distinct and periodic phases [12], [18]. A QCA circuit is partitioned into serial (1-D) zones, and each zone is maintained in a phase. Clocking effectively traps cells of a zone into a specific polarization while permitting cells in adjacent zones to undergo changes. For QCA, the clock phases are switch, hold, release, and relax. During the switch phase, the extra electrons in a cell are polarized under the influence of neighboring cells; in this phase, a cell attains a definite binary value. Interdot barriers are raised in the hold phase so that the electrons do not switch and retain their polarity. The interdot barriers are reduced in the release phase, and cells lose their polarity. In the relax phase, there is no interdot barrier, and a cell has no influence on its neighbors. Fig. 2 shows a cell in its four clock phases. Clocking controls the information (signal) flow and enables power gain in the cells (with no flow of current).

The timing zones of a QCA circuit are arranged by following the periodic execution of these four clock phases. Computation in QCA is 1 D (i.e., unidirectional and consistent with the signal propagation). By selecting an appropriate layout, the feedback paths and a higher cell density are possible using a trapezoid allocation in the zones of the Cartesian plane [13].

Usually, the QCA circuits and systems follow the clockingzone partition scheme of [12]. Designs are partitioned into multiple clocking zones only along 1 D (for example, the x-axis), thus effectively creating columns (as zones). Clocking and pipelining require designs to maintain sets of four adjacent zones at any time (as per the four phases, i.e., switch, hold, release, and relax). Clocking to each zone of a QCA design is applied through an underlying circuitry to generate a signal, as shown in Fig. 3 [20]. This circuit generates the electric field for the modulation of the tunneling barrier of all cells in the zone



Fig. 3. Four-phased signal for clocking (adiabatic switching).

(adiabatic switching). To maintain zones in sets of four phases, four conducting wires (that carry the signal shown in Fig. 3) are required. Each signal has a phase shifted by  $\pi/2$ . Clocking requires metal lines (underlying the cells) with a substantially lower feature size as well as a circuitry for the generation of the required signals [19].

## **III. CLOCKING ANALYSIS**

For QCA, adiabatic switching is commonly preferred compared with abrupt switching [12]. In an adiabatic approach, switching is accomplished by modulating the interdot tunneling barrier of the QCA cells. By applying an input signal, barriers are lowered such that cells begin to polarize. By raising back the barriers, cells are held or "crystallized" in their new states. If the change in the interdot potential barrier is gradual, then the adiabatic theory guarantees that the system always remains in the ground state and does not permanently move to an excited or metastable state [21]. A system is said to be in the ground state if it has a minimum energy, i.e., all cells polarize and attain a state as expected by cell-to-cell interactions. In an excited state, cells align contrary to the cell-to-cell electron repulsion, and a kink is said to have occurred.

In an adiabatic-switching scheme, fluctuations in operating temperature may excite the QCA cells above their ground state and produce erroneous results at the output. Lent et al. [22] provide an analysis of these thermal effects on a linear array (or line) of the QCA cells. Let  $E_k$  represent the energy required for a QCA cell to encounter kink (i.e., to align differently from its expected polarization). As the number of QCA cells in the linear array increases, the ground state remains unique, and the energy separation between the ground state and the first excited state remains  $E_{\rm k}$ . However, with an increasing number of cells, the number of locations increases, and therefore, multiple kinks may occur. Therefore, the probability for a kink-free behavior is a function of N (as denoting the number of cells in the array). In addition, at nonzero kelvin, the higher the operating temperature (T), the higher the thermal fluctuations which lead to an increase in the probability of kink occurrence. Finally, the probability for a system to be in an excited state (kink) is a function of the energy required for a kink to occur in a QCA cell  $E_k$ . A higher value of  $E_k$  reduces the probability of kink occurrence (with a scaling of cell dimension to a molecular level, the correlation between electrons in neighboring cells increases, thus resulting in an increase of  $E_{\rm k}$ ). For N QCA cells, these parameters are quantified in the following equation (derived in [22]):

$$\Delta F_n = nE_k \left[ 1 - \frac{k_B T}{E_k} \ln(N) \right]. \tag{1}$$

 $\Delta F_n$  is the energy separation between the ground state and the *n*th excited state, i.e., a zone with *n* kinks, and  $k_B$  is the Boltzmann constant. As long as the energy separation  $\Delta F_n$ is greater than zero, the QCA system does not settle in an excited thermodynamic equilibrium state. This implies that the energy required for the *n* kinks  $(nE_k)$  must be greater than for the kinks caused by thermal fluctuations  $k_BT$  nln(N). From this inequality, for a given kink energy  $E_k$  and operating temperature *T*, a bound on the number of QCA cells to avoid kinks is given by

$$N \le e^{\frac{E_{\rm k}}{k_{\rm B}T}}.$$
(2)

The bound on line (array) length obtained from (2) can be utilized in determining the largest zone dimension under the worst-case conditions. Consider a bound on N for the vertical and horizontal dimensions of a zone. From (2), thermodynamic effects can then be avoided in all QCA lines within that zone. Therefore, a kink-free behavior can be accomplished by establishing an upper bound on N for the dimension of a clocking zone. For QCA pipelining, only one zone (among a set of four adjacent zones) is in the switch phase at any time; therefore, the effective length of a long QCA line (that may span across multiple zones) must be equal to the dimension of the switching zone.

## IV. TWO-DIMENSIONAL QCA CLOCKING

The QCA-clocking mechanism proposed in [12] partitions a design into different zones only along one direction of signal flow, i.e., the x-axis. Such a scheme considers long horizontal lines and divides them among multiple (vertical) clocking zones, thus keeping their length bounded in any zone. A vertical line (in the y-axis) is always contained within a column as a single clocking zone; for complex designs, the height of a clocking zone (along the y-axis) could be significant, thus creating long vertical lines.

Consider the QCA design of the 8-to-1 multiplexer, as shown in Fig. 4; throughout this paper, this is used as a representative circuit for comparison purposes among the proposed clocking schemes. This circuit is designed using three  $(\log_2(8))$  stages of 2-to-1 multiplexers. The four 2-to-1 multiplexers in stage 1 reduce the eight inputs to four based on the select signal SEL1. Two 2-to-1 multiplexers in stage 2 reduce these four signals to two based on SEL2, and finally, a 2-to-1 multiplexer in stage 3 selects one of its two inputs as an output based on SEL3. Each 2-to-1 multiplexer is designed using three MVs (two as AND gates and one as an OR gate) and an inverter. As the SEL1 signal must be supplied to all 2-to-1 multiplexers in stage 1, a long vertical line is required (51 cells long in clocking zone 2). The length of the vertical line increases with multiplexer size (N) because the select signal must be supplied to N/2 2-to-1 multiplexers in stage 1.

The problem of long vertical lines is solved in this paper by partitioning the QCA design along the y-axis (rowwise) in addition to the x-axis (columnwise). This 2-D arrangement effectively generates a grid of clocking zones for a given QCA design. A bound for the zone dimensions restricts the length



Fig. 4. 8-to-1 QCA multiplexer (1-D clocking).

of the QCA lines and makes the QCA designs tolerant to the thermodynamic effects. The designs of QCA systems are characterized by the so-called "tournament-bracket" structure [13]. Logic signals propagate horizontally through the MVs by providing outputs toward the end of the bracket. This feature favors partitioning of designs into multiple clocking zones along the x-axis, i.e., horizontal propagation is accomplished. By having a clocking mechanism for 2-D partitioning (as for a grid of zones), extensive modifications to the original QCA design must be avoided (if possible). Similarity must be retained in signal propagation such that all zones in a column of the 2-D grid must be switched (prior to switching zones located in the next column). Fig. 5 shows the signal propagation for the proposed 2-D partitioning of the QCA designs. Signals propagate vertically in each column; after switching all zones in a column, the signal propagates horizontally to the next column of the grid. Therefore, at a reduced frequency (i.e., proportional to the number of zones in a column), signal propagation along the x-axis is still equivalent to the 1-D clocking case.

For correct operation of the QCA design, all signals in a clocking zone must be made available to the next stage during its switch phase. In the 2-D case, a signal must propagate both vertically and horizontally. Therefore, if a zone in the hold state is released as soon as the next zone in the same column completes the switch phase, then its signals will not be available during the switch phase of the corresponding zone in the next column. This inhibits signal propagation along the x-axis, leading to a possible incorrect behavior of the QCA systems. Therefore, all zones in a column must be retained in the hold state until the corresponding zones in the next column are in the switch phase. In the clocking mechanism for QCA,



Fig. 5. Proposed 2-D QCA clocking.

a zone is released as soon as the next zone is switched. The proposed mechanism for 2-D signal propagation in a grid is similar to the 1-D case because a zone can be released as soon as the zones located next (along both dimensions) are switched. Similarly, a zone can be switched only when its driving zones (in both dimensions) are in the hold state.

The proposed 2-D clocking mechanism requires changes (albeit minor) to the existing QCA designs (based on 1-D clocking). Changes are required to preserve the direction of logic propagation in the QCA lines, as shown in Fig. 5. Clocking requirements and changes in design are summarized by the following rules.

- 1) Switch all zones in a column prior to switching the zones in the next column.
- 2) Keep an entire column in the hold state until all zones located in the next column are switched.
- 3) Vertical lines spanning multiple zones should accept signals only in the zone from which they originate [this is referred to as design modification-I (DMI)].
- Signals should not travel in a direction opposite to logic propagation, neither within a column nor between the columns [this is referred to as design modification-II (DMII)].

Figs. 4 and 6 show the QCA design of the 8-to-1 multiplexer under the original 1-D and the proposed 2-D schemes. The design modification rules given previously have been applied to this circuit (DMI is applied to SEL1 and SEL2, whereas DMII is used for the MV in the last zone of the grid), using the logic propagation shown in Fig. 5. The clock-zone dimensions in Fig. 6 are on the order of tens of cells along both axes. This is consistent with the clocking-zone widths suggested by other works [12], [13] with partitions along 1-D only. Note that vertical lines receive signals in the zone from which they originate and that an MV has been moved down within a column to avoid interzone signal transfer in a direction opposite to the logic propagation. As shown in the multiplexer of Figs. 4 and 6, the circuits are almost the same, and therefore, they occupy the same area. As for the count of QCA cells, the design



Fig. 6. 8-to-1 QCA multiplexer (2-D clocking).

modifications introduce an overhead that is negligible, i.e., the number of QCA cells in Fig. 4 is 564, whereas in Fig. 6, it is 576 for a 2% increase.

As a signal in a QCA line propagates through the sequential switching of cells from the input to the output, intuitively, it would take twice as long to switch twice as many cells in a QCA line. The relationship between switching time and number of cells for a given error margin (as related to nonadiabatic ringing) can be assessed by solving the time-dependent Schrödinger equations. Lent and Tougaw [12] have provided the solution by giving the dependence of a minimum switching time on the number of cells in a line as

$$T_{\rm s} \propto C^{1.16} \tag{3}$$

where  $T_s$  is the minimum switching time, and C is the number of cells in a line. The exponential factor of 1.16 suggests that the switching time has almost a linear dependence on the number of cells (O(C)). The small deviation from linearity is the result of fitting the maxima for error (nonadiabatic ringing) in solving the Schrödinger equations.

The minimum clock period for a clocking zone is determined by the switching time of the longest QCA line in that zone. In most cases, the length of the longest QCA line is proportional to the vertical and horizontal dimensions of a zone. Therefore, even though the number of zones per column in a grid is increased, the minimum clock period for each zone is reduced due to the smaller zone dimensions. Therefore, there is a linear relationship between the clock period of a column with no partition (as in the original 1-D scheme) and a column with partitions (as in the proposed scheme).

The total computation period is the sum of the clock periods of all columns in the QCA design; this is almost the same for both the 1- and 2-D schemes. Pipelining is not affected because an entire column is used to hold the signals. In both clocking schemes, four columns are required to propagate one state of computation. For the 8-to-1 QCA multiplexer, the proposed 2-D scheme reduces the longest vertical line length from 51 to 13 cells (as shown in column 2 of Figs. 4 and 6). From (2), for a line of 51 cells to avoid kinks, the excitation energy  $E_{\rm k}$ of the cells must be 3.9 times greater than  $k_{\rm B}T$ ; for a line of 13 cells, it only needs to be 2.6 times greater. Therefore, for a given QCA technology (i.e., for a fixed  $E_k$ ), if the 8-to-1 QCA multiplexer using the proposed 2-D clocking scheme can be operated at room temperature (300 K), then the 1-D version of the same circuits must be operated at 195 K. However, the clocking circuit that is required for the 2-D scheme is more complicated than the 1-D scheme. Therefore, the 2-D scheme, even if it solves the problem of long vertical lines related to the kink energy, is still not a complete improvement over the 1-D clocking scheme as it does not provide performance improvements in terms of throughput. A detailed discussion of this topic is provided in a later section of this paper.

# V. TWO-DIMENSIONAL WAVE QCA CLOCKING

Significant improvements in computation time and simplification of clocking circuitry can be achieved by employing a different clocking mechanism for the QCA designs partitioned along 2-Ds. This new scheme is based on the parallel execution and processing in clocking zones within a different timing framework.

The principles of this technique are based on the similarity between systolic arrays and QCA with respect to clocking. The systolic arrays are special-purpose VLSI architectures introduced in the late 1970s [23]; they are made of simple processing elements with local interconnections usually arranged in a grid layout. Each processing element receives data from one or more neighboring processing elements (at its primary inputs); it then performs local computation and transfers its results to other neighboring processors (connected to its primary outputs). Two-dimensional (square) systolic arrays are used for parallel processing of matrix multiplication, accepting inputs from two sides, and propagating the outputs to two other sides. As partitioning the scheme for clocking zones, the proposed 2-D arrangement is similar to a grid with orthogonal interconnections. Computational results move from northwest to southeast similar to the implementation in a 2-D (square) systolic array. Due to these similarities, logic-wavefront propagation techniques developed for systolic arrays can also be considered for the QCA architectures to increase data pipelining and parallel processing [24].

Fig. 7 shows a logic-propagation technique for the proposed 2-D diagonal wave scheme (2DDWave). To retain similarity to the 2-D (square) systolic array (and thereby achieve parallel processing), each zone must accept input signals only from two zones (north and west) and pass its outputs to the other two zones (south and east), i.e., each column must have an equal number of zones (perfect grid). Therefore, to ensure an efficient utilization of the wavefront propagation scheme, a design modification rule must be applied in addition to the rules presented for the 2-D QCA-clocking scheme of the



Fig. 7. Clocking for the 2-D wave propagation.

previous section, i.e., the design must be partitioned into a perfect grid of zones such that all zones in a row have the same height and all zones in a column have the same width. Figs. 6 and 8 show the 8-to-1 QCA multiplexer before and after the aforementioned design modification rule. In this perfectgrid scheme, the correct switching of a zone requires only two zones (one located above the switch-phase zone and one located to the left of the switch-phase zone) to be in the hold phase. Similarly, a zone needs to be in the hold phase only until the zones located below (south) and right (east) are switched. With this switching arrangement, the proposed diagonal wavefront propagation scheme (denoted as 2DDWave) produces at the output the same results as the 1- and 2-D schemes presented previously.

In a 1-D clocking scheme, the lengths of the vertical lines are not bounded because they increase as a function of design size. As the operating temperature (T) changes with the number of cells (N) in the longest QCA line of a clocking zone, T becomes a function of the design size. However, in the proposed 2-D schemes, independent of the design size, line lengths can be bounded as partitioning occurs along both the x- and y-axes. Therefore, the QCA designs under 2-D schemes are robust to thermal fluctuations and can be operated at higher temperatures, mostly independent of size. In a 2-D scheme, the underlying feature is the sequential processing in a linear fashion. All zones in a column are sequentially switched prior to switching zones in the next column (Fig. 5). In the proposed 2-D wave-clocking scheme (2DDWave), switching is performed in parallel; all zones that are located along the diagonals are switched simultaneously. Therefore, the computation time for the 2-D scheme increases quadratically with the number of zones along the x- and y-axes (given by  $Z_x \times Z_y$ ), whereas in the 2DDWave scheme, the increase is linear  $(Z_x + Z_y)$ . In a previous section, it has been shown that the computation times for the 1- and 2-D schemes are equivalent; the proposed 2DDWave scheme performs better in terms of processing speed than these two schemes.



Fig. 8. 8-to-1 QCA multiplexer (2-D wave clocking).

Table I shows the characteristics of the three clocking schemes discussed in this paper for the 8-to-1 QCA multiplexer design (Figs. 4, 6, and 8) as an example. As the underlying feature of both 2-D schemes is to partition the QCA system along the x- and y-axes, they have common characteristics of kink-resilient behavior and a higher operating temperature (as discussed previously). However, as an additional advantage, the 2DDWave scheme improves the computation time.

As reported previously [12], [25], the QCA designs can be clocked by an electric field generated by a set of parallel conducting wires buried under the substrate. For the 1-D scheme, these metal wires are vertically oriented such that columns of clocking zones are formed. By keeping the set of four adjacent metal wires out of phase by  $\pi/2$  and by applying the signal shown in Fig. 9, clocking requirements can be satisfied. However, clocking in the 2-D case is more complicated because all zones in a column are clocked simultaneously during the hold, release, and relax phases, but they are clocked sequentially during the switch phase. Therefore, to provide a phase-based clocking, additional circuitry must supply multiple signals; moreover, multiplexing between them is also required (the reader should refer to [26] for additional details).

The 2DDWave scheme requires a simpler arrangement because all zones along the diagonals are clocked simultaneously in all phases. However, in this case, the set of parallel metal wires runs diagonally to the QCA design, i.e., a wire runs under all clocking zones located diagonally to each other. To provide a uniform electric field across a clocking zone, two layers of metal wires are required, as shown in Fig. 9. The diagonal metal wires run in (bottom) layer 1 over the entire QCA design; metal wires in (top) layer 2 are small, disjointed, and extend

| Characteristics      | 1D                   | 2D                   | 2DDWave             |
|----------------------|----------------------|----------------------|---------------------|
| No.of Cells $(C)$    | 564                  | 576                  | 576                 |
| No.of Zones $(Z)$    | 6                    | 24                   | 24                  |
| Max. Wire Len $(L)$  | 51                   | 13                   | 13                  |
| $\frac{E_k}{kT}$ for | 3.9                  | 2.6                  | 2.6                 |
| kink-free operation  |                      |                      |                     |
| Max. Temp for        | 195k                 | 300k                 | 300k                |
| kink-free operation  |                      |                      |                     |
| Computation Time     | $\sim$ 24 time units | $\sim$ 24 time units | $\sim$ 9 time units |
| Pipelining           | Four-staged          | Four-staged          | Four-staged         |
| Clocking Circuitry   | Modest               | Complex              | Modest              |

 TABLE I

 COMPARISON OF DIFFERENT CLOCKING SCHEMES



Fig. 9. Clocking circuitry for the QCA designs. (a) Circuitry for the 1-D clocking scheme. (b) Clocking scheme for the 2DDWave clocking scheme. (c) Second layer of metal wires to provide a uniform electric field over a clocking zone in the 2DDWave scheme.

only over a single clocking zone to provide a uniform electric field. Metal wires in layers 1 and 2 are insulated through an oxide layer such that the electric field generated by metal layer 1 does not interfere with the electric field of metal layer 2. The signal in metal layer 1 is transferred to the metal wires in layer 2 (for the diagonal clocking zones) through vias; a ground plane (not shown in the figure) can be added on top of the QCA layer to reduce fringing effects for the lines of the E field [19], [20].

Logic-level effects due to the interference in the electric field between adjacent metal wires used for clocking are minor because the QCA cells that are at the boundary must belong to either of the two adjacent clocking zones (depending on the strength of the electric fields in the corresponding layer-2 metal wires). Therefore, the interference of electric fields can be tolerated by designing circuits such that the QCA cells at clockzone boundaries can belong to either of the clocking zones and still not modify the logic functionality.

The 8-to-1 multiplexer design can be extended to other circuits with similar functionality. Fig. 10 shows the QCA

design of a 3-to-8 decoder under the 2DDWave scheme. This circuit can be used in interconnection networks and for memory address decoding [8]. The design of this circuit is similar to the 8-to-1 multiplexer (shown in Fig. 8); it uses few MVs reduced to AND/OR gates at each of the  $\log_2(n) = 3$  stages to decode the address. Fig. 10 shows the design modifications that are required under the 2DDWave scheme to overcome the tournament-bracket (tree) structure of the 1-D clocking technique.

# VI. FEEDBACK PATHS

One of the main issues arising in the clocking schemes for QCA is the ability to handle the feedback paths. In both 1-D and the proposed 2-D clocking schemes, signal propagation is strictly unidirectional [from west to east in the 1-D case (Fig. 4) and from northwest to southeast in the 2-D case (Fig. 7)]. Hence, although the clocking schemes are readily applicable to combinational circuits, the feedback paths (as in sequential circuits) may require a different technique.



Fig. 10. 3-to-8 decoder under the 2DDWave clocking scheme.

Niemier and Kogge [13] have proposed a trapezoid clocking mechanism for the 1-D scheme to enable the feedback paths in the QCA designs and to better utilize the layout area (by exploiting the tournament-bracket structure of the QCA circuits). The main principle of the trapezoid approach for handling the feedback paths consists of having a sequence of clocking zones to loop backward along the (feedback) path. This allows a QCA wire in a loop of clocking zones to route a feedback signal even though signal propagation between the clocking zones is still unidirectional. The so-called trapezoid mechanism [13] can also be adopted for the proposed 2-D clocking schemes to allow feedback paths. Fig. 11 shows the loop of clocking zones for implementation under a 2-D scheme. To allow feedback paths, the zones in each region are clocked using the 2DD wave scheme such that the signal propagation is as follows: from northwest to southeast in regions 1 and 2, from northeast to southwest in regions 3 and 4, from southeast to northwest in region 5, and from southwest to northeast in region 6. Thus, the circuits in all six regions can receive their outputs as among their inputs using the feedback paths. The circuits can also receive new inputs and propagate their outputs; for example, while region 2 receives a feedback input from west and propagates the feedback path through south, it can receive new inputs from north and can send out the outputs through east. If each region in Fig. 11 has only one zone, then the feedback path reduces to the basic trapezoid clocking mechanism of [13]. A difference in directions of signal propagation in the regions does not result in an added complexity for the underlying clocking circuitry. This occurs because the zones in each region are still clocked using the same quasi-adiabatic switching mechanism (consisting of four clock phases), as originated from the wires generating the E field for the clock signal. To achieve the required directions of signal propagation, the clock phases of the zones must be scheduled such that switching



Fig. 11. Feedback path for the 2-D clocking schemes.

of the final zone in the 2DD wave of a region is followed by the switching of the first zone in the 2DD wave of the next region, i.e., synchronization of clock phases between regions must be maintained.

Thus, the proposed 2-D clocking schemes can be used for clocking both the combinational and sequential circuits in QCA while avoiding the problem of kinks and improving the performance. The proposed schemes are general; for memories, Vankamamidi *et al.* [8], [9] have proposed architectures that also target the problem of kinks by making the QCA line length in a clocking zone independent of the memory size.

#### VII. SIMULATION RESULTS

QCADesigner [15] provides a design and simulation environment for the QCA circuits; it has multiple simulation engines and computer-aided-design capabilities. In this section, the simulation results of the proposed 2-D diagonal (2DD) clocking scheme are presented using the QCADesigner. Three logic circuits have been designed and simulated by using the 2DD clocking scheme. For all simulations, the QCA cell dimension of 18 nm and a dot size of 5 nm are used. Results are obtained using the coherence vector engine of the QCADesigner.

Figs. 12, 14, and 16 show few circuits designed in the QCADesigner and clocked such that when a 2-D grid is imposed, all zones along a diagonal are in the same clock phase.

# A. 2-to-1 Multiplexer

Fig. 12 shows the design of a 2-to-1 Mux; it requires two AND gates followed by an OR gate. The 2-to-1 Mux is the building block for larger multiplexers (e.g., an *n*-to-1 multiplexer is built using two (n/2)-to-1 multiplexers and a 2-to-1 Mux) in a recursive form. The 2-D grid is imposed to show that all diagonal zones are in the same clock phase. All design requirements for 2DD wave clocking are met as signal flow from northwest to southeast. Fig. 13 shows the result of the simulations for the inputs and the outputs. Input Sel is defined



Fig. 12. 2-to-1 mutiplexer under the 2DD-wave-clocking scheme simulated by the QCADesigner.

| Simulation Results |                                                                                                                 |     |  |  |  |
|--------------------|-----------------------------------------------------------------------------------------------------------------|-----|--|--|--|
| max: 1.00          |                                                                                                                 |     |  |  |  |
| Sel                |                                                                                                                 |     |  |  |  |
| min:-1.00          |                                                                                                                 |     |  |  |  |
| max: 1.00          |                                                                                                                 |     |  |  |  |
| в                  |                                                                                                                 |     |  |  |  |
| min:-1.00          |                                                                                                                 |     |  |  |  |
| max: 1.00          |                                                                                                                 |     |  |  |  |
| λ                  |                                                                                                                 |     |  |  |  |
| min:-1.00          |                                                                                                                 |     |  |  |  |
| max: 0.99          |                                                                                                                 | -   |  |  |  |
| Out                | عبار المحصص ومصبعا المحصص ومصبعا بالمحصص ومصبعا بالمحصص ومصبعا بالمحصص المحصص ومصبع بالمحصص ومصبعا المحصصا المح |     |  |  |  |
| min:-0.95          |                                                                                                                 |     |  |  |  |
| max: 0.00          |                                                                                                                 | T   |  |  |  |
| CLOCK 0            |                                                                                                                 | ļ   |  |  |  |
| min: 0.00          |                                                                                                                 |     |  |  |  |
| max: 0.00          |                                                                                                                 |     |  |  |  |
| CLOCK 1            |                                                                                                                 |     |  |  |  |
| min: 0.00          |                                                                                                                 |     |  |  |  |
| max: 0.00          |                                                                                                                 | Ē   |  |  |  |
| CLOCK 2            |                                                                                                                 | Į., |  |  |  |
| min: 0.00          |                                                                                                                 | L   |  |  |  |
| nax: 0.00          |                                                                                                                 | 7   |  |  |  |
| CLOCK 3            |                                                                                                                 |     |  |  |  |
| nin: 0.00          |                                                                                                                 |     |  |  |  |

Fig. 13. Waveforms for the 2-to-1 mutiplexer under the 2DD clocking scheme simulated by the QCADesigner.

by the bit string 0000111100001111, Input B is given by 0011001100110011, Input A is given by 010101010101010101, and therefore, Output Out is given by XX00110101001101 which is the logic behavior of a 2-to-1 Mux, where X is a don't-care value. There is a delay of three clock periods because it takes three clock periods for the inputs to reach the output.

# B. One-Bit Full Adder

Fig. 14 shows a 1-b full adder designed using the QCADesigner. The implementation of the Carry Out is not shown as it can be obtained in QCA by using a single MV gate. All the rules and techniques followed in the design of 2-to-1 Mux are also used in this circuit although it is much larger (it requires  $9 \times 8$  zones). Fig. 15 shows the simulation results. The Input A is 00001111, the Input B is 00110011, the Input



Fig. 14. One-bit adder under the 2DD clocking scheme simulated by the QCADesigner.



Fig. 15. Waveforms for the 1-b adder under the 2DD clocking scheme simulated by the QCADesigner.

Cin is given by 01010101, and the Output Sum is given by XXXX0110 which is the logic behavior of a 1-b full adder. There is a delay of five clock periods because it takes five clock periods for the inputs to reach the output.

# C. Reset-Set (RS) Flip-Flop

The proposed 2DD-wave-clocking scheme has also been evaluated for a sequential circuit with a feedback loop. Fig. 16 shows an RS flip-flop (originally proposed in [27]); Fig. 17 shows the corresponding schematic diagram. The logic circuit of the RS flip-flop is clocked using the 2DD wave clocking such that the signal propagation between clocking zones is from



Fig. 16. RS flip-flop under the 2DD clocking scheme simulated by the QCADesigner.



Fig. 17. Schematic of the RS flip-flop used in the QCA design.



Fig. 18. Waveforms for the RS flip-flop under the 2DD clocking scheme simulated by the QCADesigner.

northwest to southeast, whereas the feedback path is clocked such that the signal propagation is from northeast to southwest (i.e., the feedback path has only one row of clocking zones, and the signal propagation is from east to west).

Fig. 18 shows the result of the simulations for Input R as 00011100000000011, Input S as 11100000011100000, and Output Q as XX111000000111111, where X indicates a don't-care condition. There is a delay of the three clock cycles because it takes three clock cycles for the inputs to reach the output.

# VIII. CONCLUSION

The QCA has been advocated as a potential device architecture for nanotechnology. The QCA not only gives a solution at nanoscale but also offers a new method of computation and information transformation. However, the QCA designs of even modest complexity suffer from the disadvantage of long vertical lines in the placement of the cells, thus resulting in long delay, slow timing, inability to operate at higher (room) temperature, and sensitivity to thermal fluctuations.

In this paper, we have considered issues pertaining to the timing and clocking of the QCA systems for high-performance computing. Different schemes for clocking and timing have been proposed; these schemes utilize novel 2-D techniques that permit a reduction in the longest line length in each clocking zone. Similar to [12], the proposed arrangements result from the four phases required to correctly operate the QCA cells. Differently from previous works, the QCA design is partitioned into a grid of zones along both directions (vertically and horizontally) of signal flow. The proposed clocking schemes are based on the equivalence between the systolic processing and the QCA zone switching, thus permitting parallel processing of signals across both dimensions of the QCA circuit.

As novel logic-propagation techniques are introduced, computational time and pipelining have been extensively analyzed as some of the most important performance metrics. The significant reduction in maximum line length permits a fast timing and efficient pipelining to occur while guaranteeing a kink-free behavior in switching. It has been shown that the proposed 2-D schemes can also be used in a layout with feedback paths, thus confirming their applicability to sequential circuits implemented by the QCA. The proposed clocking schemes have been evaluated on both the combinational and sequential circuits using the QCADesigner [15].

#### REFERENCES

- R. Compano, L. Molenkamp, and D. J. Paul, "Technology roadmap for nanoelectronics," in *Proc. Eur. Comm. IST Programme, Future Emerging Technol.*, 1999.
- [2] C. S. Lent, B. Isaksen, and M. Lieberman, "Molecular quantum-dot cellular automata," *J. Amer. Chem. Soc.*, vol. 125, no. 4, pp. 1056–1063, Jan. 2003.
- [3] Y. Lu and C. S. Lent, "Theoretical study of molecular quantum dot cellular automata," in *Proc. IEEE Int. Workshop Comput. Electron.*, 2004, pp. 118–119.
- [4] M. T. Niemier, A. F. Rodrigues, and P. M. Kogge, "A potentially implementable FPGA for quantum dot cellular automata," in *Proc. 1st* Workshop NSC-1, Held in Conjunction With 8th Int. Symp. on High Performance Computer Architecture (HPCA-8), Boston, MA, 2002.
- [5] P. D. Tougaw and C. S. Lent, "Logical devices implemented using quantum cellular automata," J. Appl. Phys., vol. 75, no. 3, pp. 1818–1825, Feb. 1994.
- [6] V. S. Dimitrov, G. A. Jullien, and K. Walus, "Quantum-dot cellular automata carry-look-ahead adder and barrel shifter," in *Proc. IEEE Emerging Telecommun. Technol. Conf.*, Sep. 2002.
- [7] K. Walus, A. Vetteth, G. A. Jullien, and V. S. Dimitrov, "RAM design using quantum-dot cellular automata," in *Proc. Nanotechnology Conf.*, 2003, vol. 2, pp. 160–163.
- [8] V. Vankamamidi, M. Ottavi, and F. Lombardi, "Tile-based design of a serial memory in QCA," in *Proc. ACM Great Lakes Symp. VLSI*, 2005, pp. 201–206.
- [9] V. Vankamamidi, M. Ottavi, and F. Lombardi, "A line-based parallel memory for QCA implementation," *IEEE Trans. Nanotechnol.*, vol. 4, no. 6, pp. 690–698, Nov. 2005.
- [10] F. Hofmann, T. Heinzel, D. A. Wharam, J. P. Kotthaus, G. Böhm, W. Klein, G. Trnkle, and G. Weimann, "Single electron switching in a parallel quantum dot," *Phys. Rev. B, Condens. Matter*, vol. 51, no. 19, pp. 13 872–13 875, May 1995.
- [11] G. H. Bernstein, I. Amlani, A. Orlov, C. Lent, and G. Snider, "Observation of switching in a quantum-dot cellular automata cell," *Nanotechnology*, vol. 10, no. 2, pp. 166–173, Jun. 1999.
- [12] C. S. Lent and P. D. Tougaw, "A device architecture for computing with quantum dots," *Proc. IEEE*, vol. 85, no. 4, pp. 541–557, Apr. 1997.

- [13] M. T. Niemier and P. M. Kogge, "Problems in designing with QCAs: Layout = timing," *Int. J. Circuit Theory Appl.*, vol. 29, no. 1, pp. 49–62, 2001.
- [14] C. Ungarelli, S. Francaviglia, M. Macucci, and G. Iannaccone, "Thermal behavior of quantum cellular automaton wires," *J. Appl. Phys.*, vol. 87, no. 10, pp. 7320–7325, May 2000.
- [15] K. Walus, V. Dimitrov, G. A. Jullien and W. C. Miller, "QCADesigner: A CAD tool for an emerging nano-technology," in *Proc. Micronet Annu. Workshop*, 2003. [Online]. Available: http://www.qcadesigner.ca/papers/ micronet2003.pdf
- [16] A. O. Orlov, I. Amlani, G. H. Bernstein, C. S. Lent, and G. L. Snider, "Realization of a functional cell for quantum-dot cellular automata," *Science*, vol. 277, no. 5328, pp. 928–930, Aug. 1997.
- [17] C. G. Smith, "Computation without current," *Science*, vol. 284, no. 5412, p. 274, Apr. 1999.
- [18] A. O. Orlov, I. Amlani, R. Kummamuru, R. Rajagopal, G. Toth, C. S. Lent, G. H. Bernstein, and G. L. Snider, "Experimental demonstration of clocked single-electron switching in quantum-dot cellular automata," *Appl. Phys. Lett.*, vol. 77, no. 2, pp. 295–297, Jul. 2000.
- [19] S. E. Frost, T. J. Dysart, P. M. Kogge, and C. S. Lent, "Carbon nanotubes for quantum-dot cellular automata clocking," in *Proc. IEEE Conf. Nanotechnology*, 2004, pp. 171–173.
- [20] K. Hennessy and C. S. Lent, "Clocking of molecular quantum-dot cellular automata," J. Vac. Sci. Technol. B, Microelectron. Process. Phenom., vol. 19, no. 5, pp. 1752–1755, Sep. 2001.
- [21] D. J. Griffiths, Introduction to Quantum Mechanics. Englewood Cliffs, NJ: Prentice-Hall, 1994.
- [22] C. S. Lent, P. D. Tougaw, and W. Porod, "Quantum cellular automata: The physics of computing with arrays of quantum dot molecules," in *Proc.Workshop PhysComp*, 1994, pp. 5–13.
- [23] H. T. Kung and C. E. Leiserson, "Systolic arrays (for VLSI)," in *Proc. Sparse Matrix*, I. S. Duff and G. W. Stewart, Eds., 1978, pp. 256–282.
- [24] S. Y. Kung, K. S. Arun, R. Gal-Ezer, and B. Rao, "Wavefront array processor: Language, architecture, and applications," *IEEE Trans. Comput.—Special Issue Parallel Distributed Processing*, vol. C-31, no. 11, pp. 1054–1066, Nov. 1982.
- [25] C. S. Lent and B. Isaksen, "Clocked molecular quantum-dot cellular automata," *IEEE Trans. Electron Devices*, vol. 50, no. 9, pp. 1890–1896, Sep. 2003.
- [26] V. Vankamamidi, M. Ottavi, and F. Lombardi, "Timing and clocking of QCA systems," Northeastern University, ECE Department, Boston, MA, 2004. Internal Report (available upon request).
- [27] M. Momenzadeh, J. Huang, and F. Lombardi, "Defect characterization and tolerance of QCA sequential devices and circuits," in *Proc. IEEE Int. Symp. DefectFault Toler. VLSI Syst.*, 2005, pp. 199–207.



Vamsi Vankamamidi (S'07) received the B.S. degree in computer engineering from the University of Mumbai, Mumbai, India, in 2000, and the M.S. degree in electrical engineering and computer science from the University of Toledo, Toledo, OH, in 2001. He is currently working toward the Ph.D. degree in computer engineering in the Department of Electrical and Computer Engineering, Northeastern University, Boston, MA. As part of his dissertation, he is working on quantum-dot cellular automata, which is a nanoscale device architecture to supersede

the conventional silicon-based technology. His research interests include design of nanoscale circuits and systems, electronic design automation, defect tolerance, and reliability.



**Marco Ottavi** (M'04) received the Laurea degree in electronic engineering from the University of Rome "La Sapienza," Rome, Italy, in 1999, and the Ph.D. degree in microelectronic and telecommunications engineering from the University of Rome "TorVergata," Rome, in 2004.

In 2000, he was with the ULISSE Consortium, Rome, as a Design Engineer of digital systems for space applications. In 2003, he was a Visiting Research Assistant with the Department of Electrical and Computer Engineering, Northeastern University,

Boston, MA, where he has been a Postdoctoral Research Associate since 2004. In 2006, he was a Visiting Research Scholar at Sandia National Laboratories, Albuquerque, NM. His research interests include yield and reliability modeling, fault-tolerant architectures, and online testing and design of nanoscale circuits and systems.



Fabrizio Lombardi (M'82–SM'02) received the B.Sc. (Hons.) degree in electronic engineering from the University of Essex, Colchester, U.K., in 1977, the Diploma degree in microwave engineering and the M.Sc. degree in microwave and modern optics from the Microwave Research Unit, University College London, London, U.K., in 1978, and the Ph.D. degree from the University of London, London, in 1982.

In 1977, he was with the Microwave Research Unit, University College London. He was a Fac-

ulty Member with the Texas Tech University, Lubbock, the University of Colorado–Boulder, Boulder, and the Texas A&M University, College Station. He is currently the holder of the International Test Conference Endowed Chair Professorship with the Northeastern University, Boston, MA. At the same institution, from 1998 to 2004, he served as Chair of the Department of Electrical and Computer Engineering. His research interests include testing and design of digital systems, bio- and nanocomputing, emerging technologies, defect tolerance, and computer-aided-design very large scale integration. He has extensively published in these areas and coauthored/edited seven books.

Dr. Lombardi has been involved in organizing many international symposia, conferences, and workshops sponsored by professional organizations. He has been the Chair of the Committee on "Nanotechnology Devices and Systems" of the Test Technology Technical Council of the IEEE since 2003. He is the Founding General Chair of the IEEE Symposium on Network Computing and Applications. He is also a Guest Editor of special issues in archival journals and magazines such as the IEEE TRANSACTIONS ON COMPUTERS, the IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, the IEEE Micro Magazine, and the IEEE Design and Test Magazine. He was an Associate Editor from 1996 to 2000 and the Associate Editor-in-Chief from 2000 to 2006 of the IEEE TRANSACTIONS ON COMPUTERS and was twice a Distinguished Visitor of the IEEE Computer Society in 1990-1993 and in 2001-2004. Since 2000, he has been an Associate Editor of the IEEE Design and Test Magazine and the ACM Journal on Emerging Technologies in Computing Systems. Since January 1, 2007, he has been the Editor-in-Chief of the IEEE TRANSACTIONS ON COMPUTERS. He was the recipient of the 1985-1986 Research Initiation Award from the IEEE/Engineering Foundation and the Silver Quill Award from Motorola, Austin, in 1996. He has received many professional awards, namely, the Visiting Fellowship at the British Columbia Advanced System Institute, University of Victoria, Victoria, BC, Canada, in 1988, and was twice a recipient of the Texas Experimental Engineering Station Research Fellowship in 1991-1992 and in 1997-1998, the Halliburton Professorship in 1995, the Outstanding Engineering Research Award from the Northeastern University in 2004, and the International Research Award from the Ministry of Science and Education of Japan in 1993-1999.