The snippets of the requirements are as follows:
The prospect is looking for Audio Collection.
The prospect has requested for recording a new data set and the details of the requirement are as mentioned below.
Accent: Native English speaker from New York (It means we don't want US west coast style accent)
Gender: Female
Age: Born in 1999 – 2001 (young voice)
Expressive/ high Prosody: The dataset must be expressive by including such as intonation, stress, and rhythm in the recording to model different styles in speech. Also, non-speech sounds increase expressiveness as listed in the non-speech category below.
Non-speech - Category: Lip smack, Laughing, Breath noise, Inhaling, Hum, Um
Unique Words: > 40K
Total audio length: > 24h
Phonemes: Phonetical balance and 100 hours amounts of voice.
1 hour/person with above non-speech condition included.
The ask is for feasibility, samples, pricing and timelines.
Scenarios & Pricing:
A) 20~21 aged girl's script and voice. (at least 24h ~ 30h) Scripts need to include some kind of generation Z conversation (non-speech type script have to be included)
B) 20~21 aged girl's script and voice. (at least 24h ~ 30h) Scripts need to include some kind of generation Z conversation.(without non-speech type script)
C) 100 * 1 hour / person scripts and data (non-speech type script have to be included)
D) 100 * 1 hour / person scripts and data (without non-speech type script)
Need Quote For:
1) Case A
2) Case B
3) Case A + C
4) Case B + D