This is all part of a memory system called working memory. Working memory contains many modules, one of which is called the phonological loop, that manipulates and stores auditory information. The phonological loop itself contains two subsystems: the phonological memory store, which is pretty much a 2-second tape that keeps auditory information in your mind for a few seconds, or as long as you rehearse it. The second part is the articulatory sub-vocal rehearsal module: that would be the voice that you use, for example, when you read, or when you repeat things in your mind to learn them.
Another part of working memory is called the visuo-spatial sketchpad. It basically serves a similar purpose, which is to temporarily keep in mind things that we see. It includes 3 subsystems, one of which is called the visual buffer, and contains your conscious visual imagery.
The two systems are related, and information can be translated in your mind from one to the other. For example, if you read a book, the information is in the visuo-spatial sketchpad, but for you to understand the content, it is "translated" into "auditory form" by your inner voice.
I recommend checking out Alan Baddeley's work for details, such as: Baddeley, A.D. (2007). Working memory, thought and action. Oxford: Oxford University Press.Source link: http://www.reddit.com/r/psychology/comments/ygd8w/the_voice_in_everyones_head_what_exactly_is_it/c5vcaie