In the past few years, several case studies have illustrated that the use of occupancy information in buildings leads to energy-efficient and low-cost HVAC operation. The widely presented techniques for occupancy estimation include temperature, humidity, CO2 concentration, image camera, motion sensor and passive infrared (PIR) sensor. So far little studies have been reported in literature to utilize audio and speech processing as indoor occupancy prediction technique. With rapid advances of audio and speech processing technologies, nowadays it is more feasible and attractive to integrate audio-based signal processing component into smart buildings. In this work, we propose to utilize audio processing techniques (i.e., speaker recognition and background audio energy estimation) to estimate room occupancy (i.e., the number of people inside a room). Theoretical analysis and simulation results demonstrate the accuracy and effectiveness of this proposed occupancy estimation technique. Based on the occupancy estimation, smart buildings will adjust the thermostat setups and HVAC operations, thus, achieving greater quality of service and drastic cost savings.