Résumé : How the human brain uses self-generated auditory information during speech production is rather unsettled. Current theories of language production consider a feedback monitoring system that monitors the auditory consequences of speech output and an internal monitoring system, which makes predictions about the auditory consequences of speech before its production. To gain novel insights into underlying neural processes, we investigated the coupling between neuromagnetic activity and the temporal envelope of the heard speech sounds (i.e., cortical tracking of speech) in a group of adults who 1) read a text aloud, 2) listened to a recording of their own speech (i.e., playback), and 3) listened to another speech recording. Reading aloud was here used as a particular form of speech production that shares various processes with natural speech. During reading aloud, the reader's brain tracked the slow temporal fluctuations of the speech output. Specifically, auditory cortices tracked phrases (<1 ​Hz) but to a lesser extent than during the two speech listening conditions. Also, the tracking of words (2–4 ​Hz) and syllables (4–8 ​Hz) occurred at parietal opercula during reading aloud and at auditory cortices during listening. Directionality analyses were then used to get insights into the monitoring systems involved in the processing of self-generated auditory information. Analyses revealed that the cortical tracking of speech at <1 ​Hz, 2–4 ​Hz and 4–8 ​Hz is dominated by speech-to-brain directional coupling during both reading aloud and listening, i.e., the cortical tracking of speech during reading aloud mainly entails auditory feedback processing. Nevertheless, brain-to-speech directional coupling at 4–8 ​Hz was enhanced during reading aloud compared with listening, likely reflecting the establishment of predictions about the auditory consequences of speech before production. These data bring novel insights into how auditory verbal information is tracked by the human brain during perception and self-generation of connected speech.