5. Conclusion
This work has presented a CUDA based parallel implementation for a flexible, two phase, 3D Forward Reservoir Simulation (FRS). Results showed that CUDA parallel implementation of FRS enables solving an 82 times larger problem than the serial counterpart. Moreover, if accompanied by proper preconditioning, BiCGSTAB was shown to be a stable solver that could be incorporated in such simulations instead of the more expensive and usually utilized GMRES that demands storage because of long recurrences. Despite the achieved performance, current implementation uses many registers per kernel, the thing that restricts block concurrency and affects thread latency hiding. Various optimization opportunities, detailed documentation of the implementation as well as the source code will be described in a separate work. Besides the mentioned observations that required more in depth investigation,implementing a parallel oil reservoir in CUDA is only the first step for many interesting studies to come. Future work includes: FRS based MIC implementation, FRS based OpenACC implementation, FRS on a cluster of GPUs, utilizing Multigrid preconditioners, testing different variants of Krylov solvers and others.