Audio Books Requirements From A Sound Engineers Point of View

In 2017 many people are getting into the Audio Book Narration Profession. The requirements concerning the ACX, as well as other Audio Book distributors, are both extensive as well as easy to accomplish if you can grasp the basic concepts of Audio Law. Anytime I use the phrase “AUDIO LAW” you need to make sure you follow that advice to the letter or as close as your system will allow. If you can not accomplish this with room to spare, then you have the wrong recording setup or you have failed to heed my advice or a combination of both. Hello, my name is Dana and I have been recording the “Spoken Word” for 45 years.

The first thing you need to understand is the basic theory of why these requirements exist. That is what this article is going to do. Explain why you need to follow the “Laws of Audio” to meet these requirements on a Professional Level. I am also going to dispel many myths or advice that is floating around the www as to how you can accomplish this. This article will also contain several short video examples so you can view them to obtain a greater understanding concerning these requirements and processes. You can click here to download this article in a PDF format. You can click here to download this article in an MP3 format that meets the ACX requirements.

Just like any other business, you will need to invest in the proper tools. The list below is what I recommend for setting up your home studio. I have NO AFFILIATE status with the companies below nor do I make a penny if you purchase them. I recommend this equipment as they are used by Professional Recording Studios World Wide and I own them and use them as well.

1. Reaper DAW. DAW = Digital Audio Workstation. ($60.00)
2. Shure SM57 or 58. Dynamic Microphone. ($99.00)
3. Alesis Multimix USB Fx Mixer. ($87.00)
4. On-Stage Stands MS7201B Round Base Microphone Stand. ($23.95)
5. AKG K92 Closed-back Monitor Headphones. ($59.99

This article will be long enough just explaining the basic requirements. The links I have provided will go into greater detail concerning Proper Recording Techniques and if you are just beginning this awesome journey, then we need to walk before we can run. I encourage you to consider taking my ACX Course for the low fee of $300.

The following requirements were taken from the ACX website and I am going to break them down one by one. Once you understand the basic requirements according to the ACX, the rest of the process will run as smooth as silk. Honestly, once you finish meeting these requirements you will have no problem taking your skills and recording any other type of venue or social event.

Your submitted audiobook must:
“be consistent in overall sound and formatting”

This may sound simple but if you are new to the “audio game”, this is one of the steps that most novice narrators will fail. The formatting process is a piece of cake. If you start with a mono file then you MUST export each file in a mono format for everything you upload for your book. You CANNOT have mono on one chapter and use stereo for another. The ACX HIGHLY RECOMMENDS you submit your audio in mono. My best advice is to “give them what they want”. Now comes the tough one, “be consistent in overall sound”. If you have to record 10 chapters and you do eight in your home studio, then go on vacation for two weeks and plan on finishing the other two chapters inside your hotel room, then you will most likely fail this requirement. Once you finish reading this article you will have NO PROBLEM maintaining consistency!

You must understand that there are “audio laws” you must adhere to accomplish being consistent with your over all sound. From the very beginning you need to write down your preamp input level, recording input level, mic placement, (how many inches your mic is from your mouth) as well as the offset angle of your mic. (This will help reduce plosives). You need to use the same applied plugins/filters for EQs, noise reduction, gates, crossovers and things of this nature. All of this will be explained in detail and is simple to apply. The key to all of this is CONSISTENCY.

If you apply these things to chapter one, then you need to apply them to every chapter you do for that particular book including your opening and closing credits. Consistency in audio levels, tone, noise level, spacing, and pronunciation gives the listener an enjoyable experience. Drastic changes can be jarring to the listener and are not reflective of a professional production. That is the very reason I am a proponent of using the Recommended Standards for Streaming Audio as set forth by the AES to wit I am a PROUD MEMBER. (Audio Engineering Society).

“include opening and closing credits”

This is pretty simple. At a minimum, the opening credits must note the name of the audiobook, the name of the author(s), and the name of the narrator(s). Example: “How to meet the ACX requirements for audio books. Written and narrated by Dana”. Closing credits must, at a minimum, state “the end”. Example: “Please join us at YourPodcastReviw dot com forward slash forum if you have any questions. THE END” or just “THE END”.

“be comprised of all mono or all stereo files”

Again, this is what I was talking about above. You must select a file format and keep that same format throughout the entire book. From the opening credits till the ending credits and all chapters in between. Again, the ACX “Highly Recommends” you submit your audio in a mono format So let’s give them what they want. Side note: NEVER PROCESS YOUR AUDIO CHAPTERS AT DIFFERENT TIMES. In order to keep consistent, you need to do this process the same way for every section of audio you will be submitting.

“include a retail audio sample that is between one and five minutes long”

You will want to pick and choose this requirement very carefully. After all, this is what people are going to be listing to deciding if they want to buy your book. This is your “ad” for why you want them to buy your audio book rather than your competitions. You need to be very selective and make sure that at the end of your 5 min promo, you can end on a completed sentence. Again, if you were looking for an audio book, is the five-minute sample you will listen to make you want to buy it or move on to another narrator due to the promo sounding boring or lifeless.

Each uploaded audio file must:

“contain only one chapter/section per file, with the section header read aloud”

ONE CHAPTER/SECTION: You’ll be prompted to upload each file individually. Both the opening credits and closing credits must be separate files. This ensures listeners can easily navigate between sections, and that skipping forward or backward moves them forward or back one section. This is no different than opening up a regular book and looking at the contents or chapters. If they were all one paragraph, it would be hard to figure out what was where.


These announcements help the listener understand what section of the book they are listening to without having to look at their player. If a section header is found to be missing during the ACX QA review, you will be contacted to make revisions which could delay the release of your title. Be consistent – a listener may think content is missing if most headers are read but some are not. This is very simple to do. It basically means that if you are talking about the “Prologue”, then you say Prologue at the beginning of the prologue. If you are recording chapter 3, then you start your recording with “Chapter 3”. If your transcript states, “Chapter 2, Billy gets his dog”, then record it just as the author has written it. This whole process is to keep the listener informed of where they are at in relationship to the OVERALL recording. (Book).

“have a running time no longer than 120 minutes”

Again, this is very simple to understand. Each chapter CAN NOT exceed 120 min. (Two Hours). You may have to adjust your speaking rhythm, (Cadence) in order to achieve this. The easiest thing to do is figure out what speed you will need to read at and this will come with experience. You can also use an automated prompter that may tell you how many words a minute it will show you to read. You can check out this free app for iPads here. Above all else, be CONSISTENT!

“have room tone at the beginning and end and be free of extraneous sounds”


Each file must have 0.5 to 1 second of room tone at its beginning and 1 to 5 seconds of room tone at its end. I highly recommend you lean towards the “1 sec” in the beginning of the audio file and 3 seconds at the end. Again, what ever you choose, use that same time frame for ALL your chapters. This space is required to ensure titles are successfully encoded in the many formats made available to customers. It also gives listeners an audio cue that they have reached the beginning or end of a section. Simply put, room tone is complete silence within your recording environment with nothing being said. This will also include your “Noise Floor”. This is why your recording environment is critical and you MUST apply the basic proper procedures.


Each uploaded file must be free of extraneous sounds such as plosives, mic pops, mouse clicks, excessive mouth noise, and outtakes. Extraneous sounds can distract listeners from the story, and outtakes sound unprofessional. Each can elicit bad reviews and bad reviews can hurt sales. 99.9% of this requirement can be solved using “Proper Recording Techniques”. Remember, you are trying to accomplish what is usually done in Professional Recording Studios. I promise you, with just a little effort and very little money or maybe no money at all, you can meet these requirements.

Proper Recording Technique: (Technique. noun. plural noun: techniques. “a way of carrying out a particular task, especially the execution or performance of an artistic work or a scientific procedure. I promise you that Recording Audio is indeed a “Scientific Procedure” as well as a “Mathematical Equation”! For the purposes of meeting the ACX requirements, the equations are simple addition and subtraction. Please click on on Applying Proper Microphone Technique as well as Setting up Your Home Studio for detailed information.

This requirement is where most new narrators will fail repeatedly. In my opinion, this is one of the Easiest Requirements to accomplish if you follow the rules of Proper Audio Recording and Apply the required Audio Laws! There is a common saying all over the www concerning editing audio for the ACX. It states that it is a 1 to 4 ratio. In other words for every hour of audio you record, it will require 4 hours to properly edit it. This is just not true! The only reason it would take this long is due to the fact that the “Proper Audio Recording and AUDIO LAWS” were ignored.

Audio is Mathematics. If you fail to make “2 + 2 = 4” then just as it was in grade school, you would get that answer marked incorrect. If you have too many wrong answers, YOU FAIL! My average time is a 1 to 1.5 ratio from start to finish and I am going to teach you how to accomplish the same thing, 100% of the time if you follow my advice.

This is why I highly recommend Reaper as your DAW for editing. (Digital Audio Workstation). Please keep in mind that you can drive a “Car” or ride a “Bicycle” to work. Both will get the job done. But if it rains or drops below freezing you are going to be miserable on the “Bike” and it will take much longer to get to the end results. Once you learn a few basic steps with Reaper, you will be able to see your edits in real time and that is what is going to help you achieve the results as mention in the above paragraph.

Can you use a free audio editing program like Audacity? Of course, you can and many do. You can use any program you like as long as you can achieve the same results. The only question you need to ask is: “What will provide me with the cleanest audio and complete the task the fastest”? You need to understand that one of the worst feelings in the world is to spend week or months recording your audio, only to have it rejected due to failing to meet the quality control standards set forth by the ACX. You are doing this for a reason and believe it or not, time is money.

Reaper, as well as other Professional DAWs, use what is called an ASIO file. The ASIO protocol was developed by Steinberg, the makers of the popular multitrack recording software Cubase. The primary goal of ASIO sound card drivers was to solve one vexing problem for digital music producers: latency. ASIO sound card drivers seek to reduce this problem by bypassing all unnecessary layers and communicating with the hardware as directly as possible.

The only thing you really need to know about using ASIO is when you use the ASIO file, it will bypass most everything in your computer and go directly from your microphone or USB Interface directly into Reaper or any DAW that has the ability to recognize the ASIO file as an “INPUT” device. Since you have now BYPASSED almost everything in your operating system, you will receive a CLEANER audio input signal as well as decrease the load on your CPU which in turn will reduce your Noise Floor! Reducing your noise floor BEFORE and AS you record is a CRITICAL STEP for recording Professional Quality Audio.

This is one rule of audio where “less is more”! If you CAN NOT see what you’re editing is doing to your audio, then you are simply stabbing at the issue at hand and you will not know what effect it had, till after you render your file. This can increase your workflow by tremendously! You must use the proper program to edit your audio! The more processing you need to do to your audio, the more you increase your chances of causing distortion due to over processing.

You basically need to apply three effects. 1. A low cut filter COMBINED with your EQ. 2. A Gate to remove breathing noises. 3. A Vocal Processor to bring you into the requirements of being between a -23 & -18dbs and being no louder than a 3.0dbTP. dbTP = Decibels Total Peak.

“measure between -23dB and -18dB RMS and have -3dB peak value and a maximum -60dB noise floor”

First of all, you need to know that 0db is the LOUDEST input level you can achieve in recording digital audio before clipping. Anything to the right of 0dbs or +0.1dbs will cause clipping also known as spiking or overload. This is an Audio Law. You must avoid this at ALL COST! As the input numbers go away from 0dbs towards the “NEGATIVE” side of 0dbs, towards the left, the audio or input signal becomes SOFTER or QUIETER. Once you understand this mathematical formula, it is easy to see how a -23dbs is SOFTER than a -18dbs and how a -3dbs is LOUDER than a -8dbs.

So if your RMS level is at a -26dbs and your noise floor is at a -72dbs and your total peak is at a -8dbs, you can instantly tell that by changing your -8dbTP to a -3.5dbs, you will instantly make the -26dbs 5dbs louder and you will pass the ACX requirements. Your -26dbs will now become a -21dbs and your noise floor will now become a -66dbs and your dbTP at a -3.5dbs still meets the ACX requirements.

measure between -23dB and -18dB RMS: RMS stands for “Root Mean Squared” and is a mathematical formula for measuring the overall loudness of your audio. You do not need to understand the math behind this equation but you must adhere to it in order to pass the ACX requirements. So how do we get between a -18dbs and -23dbs? The first thing we must do is supply the Proper Gain. This is an Audio Law. Regardless if you are trying to record a set of drums, guitars, keyboards, backup singers or lead singers, they all require PROPER INPUT LEVELS!

This is why so many people have problems trying to use a USB mic that runs directly on their computer and they get so frustrated because they CAN NOT meet the ACX requirements. Their operating system can not supply the proper gain on its own WHILE AT THE SAME TIME keeping their noise floor close to or below the -60db maximum requirement. That is why I highly suggest you invest in a preamp as mentioned at the beginning of this post.

Shure makes a very popular Microphone called the SM-7. If you have ever watched live broadcasters sitting in a studio, this is the mic they are probably using. It cost around $400.00. Many Podcasters bought this mic thinking it would make them sound like “Wolfman Jack”. Little did they know that not only is that impossible but they also later had to purchase a pre-amp in order to supply this mic with the proper gain.

I have answered close to 100 emails in the last SIX MONTHS asking why their $400.00 mic sounded worse than their $79.00 ATR 2100. When I asked them what preamp they use, most reply “what’s a preamp or nobody told me I needed one”. A mid level preamp to power this mic starts around $500.00. A high dollar pre amp chimes in around two grand! So after spending close to $1,000.00, they still don’t sound like Wolfman Jack. So why do I mention this? Because you can achieve Professional Results without spending THOUSANDS of DOLLARS!

So what is the proper input lever? This level is between a -12dbs and -18dbs. This is the CLEANEST noise to signal ratio input possible! This is also what most major manufacturers will set their Audio Interfaces to perform their best at as well. If you are using a USB mic by its self and all you can achieve is a -30dbs or -40dbs input, you have already condemned yourself to failure!

Even though you may have your noise floor down around a -70dbs while recording at a -40db input, by the time you raise your input level to the -23dbs to -18dbs requirements, you will be forced to make your noise floor LOUDER as well. Here is the simple math. 40dbs minus 18dbs = 22dbs. 22dbs is within our -23 to -18db ACX RMS range. You must now ADD 22dbs to your -70db noise floor which now will = -48dbs. This is the simple math. -70dbs minus 22dbs = 48dbs. Remember the closer we get to 0dbs, the louder the audio is. So as you can see, a -48dbs is much LOUDER than a -70dbs.

While you may hit your target range for your -23dbs to -18dbs RMS level now, you WILL FAIL your noise floor by 22dbs. If you try and use “NOISE REDUCTION” to remove this extra 22dbs to meet the -60db max Noise Floor requirement, YOU WILL MORE THEN LIKELY DEGRADE YOUR AUDIO to the point of it being rejected by the ACX when it is checked by “Human Ears”.

If you try and “Gate It Out”, you will destroy your file beyond repair. Again, all of this becomes a non-issue when we apply the proper gain, to begin with. This is a direct result of using Proper Recording Techniques.

” have -3dB peak value”

By leaving this headroom you’ll reduce the possibility of distortion, which can reduce the quality of the listening experience. This headroom is also needed to ensure files are successfully encoded. Remember when we talked about going past 0dbs while recording and causing your audio to clip? That is what this requirement is referring to. All this means is that at NO TIME can your audio be louder than a -3dbs. The rule of thumb here is to set your maximum loudness at a -3.5dbs to help ensure you meet this requirement.

Remember that a -3.5 dbs is softer than a -3dbs. You should start to realize now why our input levels are set between a -12dbs and -18dbs. The -12dbs give you 12dbs of HEAD ROOM before your audio will start clipping. This is your SAFETY NET.

“a maximum -60dB noise floor”: There are so many factors that can and will affect this requirement, that books have been written concerning this requirement alone! This same theory is also used by Professional Recording Studios for applying sound treatment to their recording environments as well. I promise you it’s not that hard to accomplish this process but in the same breath, this process will consist of many AUDIO LAWS that must be followed. It basically boils down to two things.

1. If you can hear the TV in the next room, kids playing outside, traffic going by, your upstairs neighbor, dogs barking, dishwashers, AC or Heat vents blowing, refrigerators running, window AC units, your inside/outside AC unit, lawn mowers, airplanes, wind blowing, rain, thunder, music from your neighborhood or ANYTHING while inside your recording environment, so can your mic and this will affect your noise floor tremendously! This is an AUDIO LAW.

If you are using a Condenser Microphone, this can spell DEATH for your audio. That is why I recommended using the Shure SM57 or 58 mics. They are dynamic mics and will pick up less background noise than a Condenser mic. As a rule of thumb, condenser mics are for Professional Studio use. However, this is not an Audio Law and depending on your recording environment, you may be able to use one. The one I use the most is an AKG P-120. It is a large diaphragm mic with a -20db and low filter cut switch located on the mic and cost $99.99.

2. Your recording environment. Again, there have been many books written concerning this issue and I will write a detailed post on how to limit this effect. This is the “Readers Digest” version. There are many factors that will affect your audio but the two main factors are “Noise Reflection and Noise Intrusion”. One of the best recording places in a Home Studio is simply an interior walk-in closet. The worst place you can record is in your kitchen, unfinished basement or bathroom. The reason is simple once you understand the process. Why do Professional Studios have “Sound Booths”? There are two reasons. To keep unwanted sound out and to keep your voice from reflecting off of hard surfaces. THAT’S THE SOLE PURPOSE!

That is why you should never record in your kitchen, bathroom, an unfinished basement. There is nothing to absorb your voice from reflecting off of hard surfaces. The “Speed of Sound travels at 340.29 miles per second at sea level and the more area your voice can bounce off or, the more your mic will pick up. Once you understand this, you can see why an “Interior Walk-in Closet” is a great location to record in.

The clothes that are hanging around you do a great job of absorbing your voice which in turn will keep it from “BOUNCING ALL OVER THE ROOM”. If you CAN NOT prevent this, it will add background noise to your noise floor as well as cause an artificial echo or reverb to your audio. There are many ways you can reduce this from happening and I will explain that process in another post. Best of all, it is extremely cheap to do and you probably also have most of the material to help accomplish this task, all around you. While a walk-in closet is ideal, it is not the only way to accomplish this requirement.

“be a 192kbps or higher MP3, Constant Bit Rate (CBR) at 44.1 kHz”: This is a simple process and you will use your DAW or audio software to set these requirements. Without going into too much detail on something you really do not need to understand:

“kbps” stands for kilobits per second, (thousands of bits per second) and is a measure of data that can flow in a given situation. CDs are produced at 360 kbps while NPR (National Public Radio) broadcast at 64kbps over the internet and sounds great! The higher the number, the better the quality.

“MP3” (MPEG-1 Audio Layer-3) is a standard technology and format for compressing a sound sequence into a very small file (about one-twelfth the size of the original file) while preserving the original level of sound quality when it is played. MP3 provides near CD quality audio.

“Constant Bit Rate (CBR)” is a term used in telecommunications, relating to the quality of service. Compare with variable bitrate. When referring to codecs, constant bit rate encoding means that the rate at which a codec’s output data should be consumed is constant.

“44.1 kHz” is a measurement in digital audio. it is a common sampling frequency. Analog audio is recorded by sampling it 44,100 times per second, and then these samples are used to reconstruct the audio signal when playing it back.

Once you set up in your DAW, you will simply set this as the “Second Default” for producing your final audio file. The “First Default” will set up to save your files in a wav format. THIS IS A MUST! Once we start getting into our proper use of our software, this will be explained in greater detail. There is nothing worst then laying down 35 or 40 minutes of audio without saving it, only to have the power go out and lose it all or make a correction and decide later on that you need to remove that correction and you have no way of doing so.

While this is not an AUDIO LAW, you need to live by these three statements when it comes to editing audio files! 1.SAVE THE FILE FIRST BEFORE RECORDING AND SAVE IT OFTEN! 2. EVERY CORRECTION YOU MAKE TO YOUR FILE USE THE “SAVE AS” COMMAND! 3. ALWAYS SAVE YOUR ORIGINAL FILE IN A WAV FORMAT. This will allow you to go back and review your audio in the original state as well as the corrected state.

If you are still trying to wrap your head around these requirements, I would love to have you take my one on one, face to face ACX Course. You can find out more about it here. Thanks, Dana.

Post a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Print your tickets