The inputs to the ALU are the data to be operated on (called operands) and a code from the control unit indicating which operation to perform. Its output is the result of the computation. In many designs the ALU also takes or generates as inputs or outputs a set of condition codes from or to a status register. These codes are used to indicate cases such as carry-in or carry-out, overflow, divide-by-zero, etc.
The system block diagram of a 4-bit ALU is shown in the Figure 1.6. ALU is a combinational circuit that performs logic and arithmetic micro-operations on a pair of n-bit operands (ex. A [3:0] and B [3:0]). The operations performed by an ALU are controlled by a set of function-select inputs. In this design a 4-bit ALU with 3 function-select inputs: Mode M, Select S1 and S0 inputs. The mode input M selects between a Logic (M=0) and Arithmetic (M=1) operation.
The synthesis & simulation is performed by using softwares like Xilinx 11.1 and ModelSim SE 5.7 which is further used for coding, testing and simulation of VHDL programs. The layout of 4- bit ALU has been shown in Figure 3.4 using Mentor Graphic 2007. The chip layout before fabrication is shown in Figure 3.4. The design was fabricated in AMI 1.5 mm CMOS process. The 4-Bit ALU occupies approximately an area of 830 x 935 mm2. SPICE simulations for the 4-bit ALU were done for post- layout extracted net lists. The RTL schematic block of 7-bit ALU is shown in Figure 3.1.
This paper presents design, optimization and implementation of 4-bit ALU. The increasing demand for low-power very large scale integration (VLSI) can be addressed at different design levels, such as the architectural, circuit, layout, and the process technology level. The simulation results shows improvement in delay of output signal & decrease the distortion of the waveforms at the output stages. Due to the major advantages the proposed design can be suitable in DSP applications.
Abstract:Resistive random access memory (RRAM) is one of the most promising new nonvolatile memories because of its excellent properties. Moreover, due to fast read speed and low work voltage, it is suitable for seldom-write frequent-read applications. However, as technology nodes shrink, RRAM faces many issues, which can significantly degrade RRAM performance. Therefore, it is necessary to optimize the sensing schemes to improve the application range of RRAM. In this paper, the issues faced by RRAM in advanced technology nodes are summarized. Then, the advantages and weaknesses in the novel design and optimization methodologies of sensing schemes are introduced in detail from three aspects, the reference schemes, sensing amplifier schemes, and bit line (BL)-enhancing schemes, according to the development of technology in especially recent years, which can be the reference for designing the sensing schemes. Moreover, the waveforms and results of each method are illustrated to make the design easy to understand. With the development of technology, the sensing schemes of RRAM become higher speed and resolution, low power consumption, and are applied at advanced technology nodes and low working voltage. Now, the most advanced nodes the RRAM applied is 14 nm node, the lowest working voltage can reach 0.32 V, and the shortest access time can be only a few nanoseconds.Keywords: RRAM; reference schemes; sensing schemes; BL-enhancing schemes
We present a processing-in-memory (PIM)-based hardware platform, referred to as MONETA, for on-chip acceleration of inference and learning in hybrid convolutional spiking neural network. MONETAuses 8T static random-access memory (SRAM)-based PIM cores for vector matrix multiplication (VMM) augmented with spike-time-dependent-plasticity (STDP) based weight update. The spiking neural network (SNN)-focused data flow is presented to minimize data movement in MONETAwhile ensuring learning accuracy. MONETAsupports on-line and on-chip training on PIM architecture. The STDP-trained convolutional neural network within SNN (ConvSNN) with the proposed data flow, 4-bit input precision, and 8-bit weight precision shows only 1.63% lower accuracy in CIFAR-10 compared to the STDP accuracy implemented by the software. Further, the proposed architecture is used to accelerate a hybrid SNN architecture that couples off-chip supervised (back propagation through time) and on-chip unsupervised (STDP) training. We also evaluate the hybrid network architecture with the proposed data flow. The accuracy of this hybrid network is 10.84% higher than STDP trained accuracy result and 1.4% higher compared to the backpropagated training-based ConvSNN result with the CIFAR-10 dataset. Physical design of MONETAin 65 nm complementary metal-oxide-semiconductor (CMOS) shows 18.69 tera operation per second (TOPS)/W, 7.25 TOPS/W and 10.41 TOPS/W power efficiencies for the inference mode, learning mode, and hybrid learning mode, respectively.
Parhi Research Group SiteResearch is focused on on all aspects of VLSI signal and image processing starting from algorithm and architecture design to design of digital integrated circuits and computer-aided design tools. Our emphasis is on developing techniques to design architectures and algorithms which can be operated with high speed, or lower area or lower power. Different applications impose different speed-power demands on implementations of an identical algorithm. While video and radar applications require high-speed, wireless and personal communications systems applications require low-power implementations. In addition to studying VLSI implementation styles, we also are studying computer arithmetic implementations and design of CAD tools for high-level synthesis of digital signal processing (DSP) systems and for multiprocessor prototyping and task scheduling of software programmable DSP systems using data-flow graph models. Very high-speed architectures are designed based on novel use of look-ahead computations, pipelining and retiming. Recent work has addressed pipelined designs for parallel decision feedback equalizers, Tomlinson-Harashima precoders, Viterbi decoders, linear-feedback shift registers, and multi-gigabit transceivers. Significant research has been directed towards parallel and pipelined implementations of turbo decoders, low-power implementations of low-density parity check codes, and crypto-acclerators. Current research on low-power design is based on implementations using overscaled supply voltage and subthreshold circuit design.
2022 Charles E. Bowers Faculty Teaching Award 2021 IEEE Circuits and Systems Society John Choma Education Award 2020 Association for Computing Machinery (ACM) Fellow for contributions to architectures and design tools for signal processing and networking accelerators2020 Fellow, National Academy of Inventors (NAI)2019-2021 IEEE Circuits and Systems Society Distinguished Lecturer 2017 Fellow, American Association for Advancement of Science (AAAS)2017 Mac Van Valkenburg Award from the IEEE Circuits and Systems Society 2013 Distinguished Alumnus Award, Indian Institute of Technology, Kharagpur2013 Award for Outstanding Contributions to Postbaccalaureate, Graduate, and Professional Education, University of Minnesota2012 Charles A. Desoer Technical Achievement Award from IEEE Circuits and Systems Society 2004 Frederick Emmons Terman Award from the American Society of Engineering Education2003 IEEE Kiyo Tomiyasu Technical Field Award2000 - present Distinguished McKnight University Professorship1997- present Edgar F. Johnson Professorship at the University of Minnesota1996 IEEE Fellow for contributions to the fields of VLSI digital signal processing architectures, design methodologies and tools 1996-1998 IEEE Circuits and Systems Society Distinguished Lecturer 1992-1997 National Science Foundation Young Investigator Award1992-1994 McKnight - Land Grant Professorship at the Univ. of Minnesota
Existing computational models of the retina often compromise between the biophysical accuracy and a hardware-adaptable methodology of implementation. When compared to the current modes of vision restoration, algorithmic models often contain a greater correlation between stimuli and the affected neural network, but lack physical hardware practicality. Thus, if the present processing methods are adapted to complement very-large-scale circuit design techniques, it is anticipated that it will engender a more feasible approach to the physical construction of the artificial retina. The computational model presented in this research serves to provide a fast and accurate predictive model of the retina, a deeper understanding of neural responses to visual stimulation, and an architecture that can realistically be transformed into a hardware device. Traditionally, implicit (or semi-implicit) ordinary differential equations (OES) have been used for optimal speed and accuracy. We present a novel approach that requires the effective integration of different dynamical time scales within a unified framework of neural responses, where the rod, cone, amacrine, bipolar, and ganglion cells correspond to the implemented pathways. Furthermore, we show that adopting numerical integration can both accelerate retinal pathway simulations by more than 50% when compared with traditional ODE solvers in some cases, and prove to be a more realizable solution for the hardware implementation of predictive retinal models.
For the Multiple-Input, Multiple Output systems, it is essential to design a new RF switch that is capable of operating with multiple antennas and frequencies as well as minimizing signal distortion and power consumption [6-8]. The excellent improvement in the frequency response of Si-CMOS devices has aggravated their use in the millimeter-wave regime such as high capacity wireless local area network, short range high data rate wireless personal area networks, and collision avoidance radar for automobiles. Using Si-CMOS for these applications allows for higher levels of integration and lower cost with improving the efficiency. Since for 65-nm technology has application of 60 GHz power amplifier designs [9,10], but recently few research group has demonstrated 60 GHz power amplifiers in 45-nm technologies. 2b1af7f3a8