Résumé : Multiple auditory structures, from cochlea to cortex, phase-lock to the envelope of complex stimuli. The relative contributions of these structures to the human surface-recorded envelope-following response (EFR) are still uncertain. Identification of the active contributor(s) is complicated by the fact that even the simplest two-tone (f1&f2) stimulus, targeting its (f2−f1) envelope, evokes additional linear (f1&f2) and non-linear (2f1−f2) phase-locked components as well as a transient auditory brainstem response (ABR). Here, we took advantage of the generalized primary tone phase variation method to isolate each predictable component in the time domain, allowing direct measurements of onset latency, duration and phase discontinuity values from which the involved generators were inferred. Targeting several envelope frequencies (0.22–1 kHz), we derived the EFR transfer functions along a vertical vertex-to-neck and a horizontal earlobe-to-earlobe recording channels, yielding respectively EFR-V and EFR-H waveforms. Subjects (N= 30) were sleeping children with normal electrophysiological thresholds and normal oto-acoustic emissions. Both EFR-H and EFR-V phase-locking values (PLV) transfer functions had a low-pass profile, EFR-V showing a lower cut-off frequency than EFR-H. We also computed the frequency-latency relationships of both EFRs onset latencies. EFR-H data fitted a power-law function incorporating a frequency-dependent traveling wave delay and a fixed one amounting to 1.2 ms. The fitted function nicely fell within five published estimations of the latency-frequency function of the ABR wave-I, thus pointing to a cochlear nerve origin. The absence of phase discontinuity and overall response durations that were equal to that of the stimulus indicated no contribution from a later generator. The recording of an entirely similar EFR-H response in a patient who had severe brainstem encephalitis with a normal, isolated, ABR wave-I but complete absence of later waves, further substantiated a cochlear nerve origin. Modeling of the EFR-V latency-frequency functions indicated a fixed transport time of 2 ms with respect to EFR-H onset, suggesting a cochlear nucleus (CN) origin, here also, without indication for multiple generators. Other features of the EFR-V response pointing to the CN were, at least for the EFR frequency below the cut-off values of the transfer functions, higher PLVs coupled with increased harmonic distortion. Such a behavior has been described in the so-called highly-synchronized neurons of the ventral cochlear nucleus (VCN). The present study compellingly demonstrated the advantage of isolating the EFR in the temporal domain so as to extract detailed spectro-temporal parameters that, combined with orthogonal recording channels, shed new light on the involved neural generators.