The problem of solving large systems of linear equations of the form (Ax = b) arises in various applications such as finite element analysis, computational fluid dynamics, and power systems analysis, which is of high algorithms complexities, that takes a lot of execution time. The high computational power required for fast solution of such problem is beyond the reach of present day conventional uniprocessor. Furthermore, the performance of using a system of uniprocessor tends to display an early saturation in relation to their costs. This implies that even modest gains in performance of a uniprocessor come at an exorbitant increase in its cost, that made the use of new technologies mandatory to minimize the execution time. This paper presents a parallel implementation of the classical solution of system of linear equations at high and reasonable speed up. The speed up achievement is obtained through the fine granularity in data and tasks, and asynchronicity to hide latency of memory access