Todays design very focused on application level design. Using optimized operation for a given software service.
This means you try to create simple, atomic operations which can be called from your business process. Each service can be distributed and scale using mainstream deployment pattern.
So far so good. What you might see once you do this. Running one thread on a single hardware gives you the predictable performance you have to achieve, running 8 concurrent threads each single operation takes a much higher execution time.
You also seen this behavior in one of your applications? Than you probably faced with concurrency issues and once you eliminated all application related issues you get aware of that even today hardware related optimization is something you have to take care of. Really?
I see and know some application in my daily work doesn't scale very well on a single hardware - they are very basic in terms of application related algorithm but they using algorithm patterns causing memory contention....
Thus you still have to understand the low level architecture and ways to optimize the basic algorithm in your code.
Have a look at "Lock-Free Algorithm" to get a very good overview on how such things still affect concurrency behavior of your application. You should also read "Beginners guide-concurrency" from the Trisha Gee and Michael Barker.
You also gets hints and estimation how virtualization might affect you performance.
Choosing the right hardware still matters in operation scenarios where concurrency is used to scale your application AND scalability is a core success factor of the application.