The circuit performs two-dimensional forward and inverse discrete cosine transform (DCT) on 8*8 blocks of data. Its implementation is based on the row-column decomposition scheme. A memory look-up approach combined with bit-serial structures is used to compute each one-dimensional DCT. A register-based transposition stage maintains the serial representation of the data after the first one-dimensional transform. This 50000-transistor circuit only uses read-only memories, registers, and adders. A pipeline architecture and a very regular layout lead to high-speed performances up to digital TV rates. The 32-pin version of the circuit accepts 9-bit pixel input and produces 12-bit coefficients in forward mode and vice-versa for inverse DCT mode. Its area is 26 mm/sup 2/ for a 1.2- mu m CMOS technology.