本論文提出結合訊號偵測常用的QR分解電路與應用於預編碼技術的幾何平均值分解電路之共同設計，支援天線數為4 × 4，其重要特色為具有恆定吞吐量的矩陣分解運算，以及在演算法層面與硬體層面皆有降低運算複雜度之方式。在演算法階段，本論文採用分治法幾何平均值分解演算法，避免傳統奇異值分解運算收斂性問題，以及改善傳統幾何平均值分解運算需要執行置換條件判斷之問題，同時降低運算複雜度與提升運算平行度。在硬體設計階段，根據演算法映射之結果分析電路運算的平行度、管線化架構之設計以及使用低複雜度的座標旋轉數位計算器簡化運算電路，並採用許多電路設計與晶片實作技巧來達到高工作頻率和低運算複雜度的晶片。本晶片透過台積電90奈米製程下線，其晶片面積為3.29 mm2，工作頻率為125 MHz且每4個時脈週期可計算一次QR分解或幾何平均值分解運算，晶片矩陣分解輸出率達每秒執行31.25M個矩陣運算，總功率消耗在不同模式下分別為87.5 mW與148.4 mW。
Due to the prosperous development of VLSI and wireless communication technology, the solution of Multiple-Input Multiple-Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) systems has become one of key technologies of wireless communications in recent years. The closed-loop MIMO-OFDM architecture is enhanced by employing precoding techniques, the channel state information at transmitter (CSIT) can be obtained via feedback from the receiver. Therefore, the scheme not only effectively improves performance of bit error rate (BER), but also can greatly reduce the complexity of the signal detector. The precoding techniques can be accomplished by applying matrix factorization methods such as singular value decomposition (SVD), geometric mean decomposition (GMD), and uniform channel decomposition (UCD).
In this thesis, we present unified work, which combines QRD for signal detection and GMD for precoding techniques, into a unified design. It supports a MIMO system with the number of antennas upto 4 � 4. Two remarkable merits of this chip are that, first, it supports matrix factorizations with a constant throughput, and second, it reduces the computational complexity at both the algorithm level and the hardware level. In the algorithm aspect, we proposed a novel GMD computing scheme based on a divide-and-conquer approach. It not only avoids the operational convergence issues of the conventional SVD, but also alleviates the permutation operations required by the traditional GMD schemes. In addition, it can simultaneously achieve computational complexity reduction and computing parallelism improvement. According to the results of algorithm mapping, we evaluate various parallel, pipelined architectures to derive our hardware design. We further employ the low complexity coordinate rotations digital computers (CORDICs) design to simplify the arithmetic units. Moreover, various chip designs and implementation skills are introduced to boost the operating frequency and to lower the computational complexity. Implementation results for a 4 � 4 QRD / GMD chip in TSMC 90-nanometer CMOS process indicate that this chip design has a chip area of 3.29 mm2 and can compute a 4 � 4 QRD or 4 � 4 GMD every 4 cycles at a clock frequency of 125 MHz. The total power consumptions are 87.5 mW and 148.4 mW respectively in each mode. This design can provide a throughput of 31.25M matrix decompositions per second.