The obvious way to make a computer system more powerful is to make the processor as fast as possible. Furthermore, adopting a large number of such fast processors would be the next step. This multiprocessor system could be useful only if it distributes workload uniformly and if its processors are fully utilized. To achieve a higher processor utilization, memory access latency must be reduced as much as possible and even more the remaining latency must be hidden. The actual latency can be reduced by using fast logic and the effective latency can be reduced by using cache. This article discusses what the memory latency problem is, how serious it is by presenting analytical and simulation results, and existing techniques for coping with it; such as write-buffer, relaxed consistency model, multi-threading, data locality optimization, data forwarding, and data prefetching.