How can a 3-second delay exist between people on TV on a live broadcast if I can and see and hear both people live?

I was watching a program, and the woman being interviewed said there was a 3-second delay, yet I was looking at both the woman and the person interviewing her in "real time." So how can there be a delay if both people's mouths are moving in "real time" to the words they are saying?
