* Welcome! [#k4f3be02] > The [[HMM/DNN-based Speech Synthesis System (HTS)>http://hts.sp.nitech.ac.jp/]] has been developed by the HTS working group and others (see [[Who we are]] and [[Acknowledgments]]). The training part of HTS has been implemented as a modified version of [[HTK:http://htk.eng.cam.ac.uk/]] and released as a form of patch code to HTK. The patch code is released under a free software license. However, it should be noted that &color(red){once you apply the patch to HTK, you must obey the [[license of HTK:http://htk.eng.cam.ac.uk/docs/license.shtml]].}; Related publications about the techniques and algorithms used in HTS can be found [[here>Publications]]. // 2.3 > HTS version 2.3 includes VBLR speaker adaptation, DAEM-based parameter generation algorithm, and other minor new features. Many bugs in HTS version 2.2 were also fixed. HTS does not include any text analyzers but the [[Festival Speech Synthesis System>http://www.festvox.org/festival/]] (English, Spanish, etc.), [[DFKI MARI Text-to-Speech System>http://mary.dfki.de/]] (German, English, etc.), [[Flite+hts_engine>http://hts-engine.sourceforge.net]] (English), [[Open JTalk>http://open-jtalk.sourceforge.net/]] (Japanese), or other text analyzers can be used with HTS. HTS slides are also released as a tutorial of HMM-based speech synthesis. > This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database>http://www.festvox.org/cmu_arctic/]] (English). For training other voices, demo scripts using NITech database (Portuguese, Japanese, and Japanese song) are also released. > In addition, HTS version 2.3.1 demo scripts support frame-by-frame modeling option using &color(red){DNN (deep neural network)}; based on HMM state alignment. // 2.2 //> HTS version 2.2 includes deterministic annealing EM algorithm in parameter estimation step, KLD-based state-mapping and cross-lingual speaker adaptation, minimum generation error (MGE) training, and other minor new features. //Many bugs in HTS version 2.1.1 were also fixed. //HTS does not include any text analyzers but the [[Festival Speech Synthesis System>http://www.festvox.org/festival/]] (English, Spanish, etc.), [[DFKI MARI Text-to-Speech System>http://mary.dfki.de/]] (German, English, etc.), [[Flite+hts_engine>http://hts-engine.sourceforge.net]] (English), [[Open JTalk>http://open-jtalk.sourceforge.net/]] (Japanese), or other text analyzers can be used with HTS. //HTS slides are also released as a tutorial of HMM-based speech synthesis. //> This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database>http://www.festvox.org/cmu_arctic/]] (English). //For training other voices, demo scripts using Nitech database (Portuguese, Japanese, and Japanese Song) are also released. // 2.1.1 //> HTS version 2.1.1 is based on HTK-3.4.1 and includes forced-alignment of hidden semi-Markov model (HSMM) and other minor new features. //Many bugs in HTS version 2.1 were also fixed. //HTS does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], [[Flite+hts_engine>http://hts-engine.sourceforge.net]], or other text analyzers can be used with HTS. //This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English). //> For training Japanese voices, a demo script using the Nitech database is also prepared. Japanese voices trained by the demo script can be used on [[Open JTalk>http://open-jtalk.sourceforge.net/]], which is a Japanese text-to-speech synthesis. // 2.1 //> HTS version 2.1 includes hidden semi-Markov model (HSMM) training/adaptation/synthesis, speech parameter generation algorithm considering global variance (GV), SMAPLR/CSMAPLR adaptation, and other minor new features. Many bugs in HTS version 2.0.1 were also fixed. The API for runtime synthesis module, hts_engine API, version 1.0 was also released. Because hts_engine can run without the HTK library, users can develop their own open or proprietary softwares based on hts_engine. HTS and hts_engine API does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], or other text analyzers can be used with HTS. This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English). //Six HTS voices for Festival 1.96 are also released. They use the hts_engine //module included in Festival. Each of HTS voices can be used without any other HTS tools. //> For training Japanese voices, a demo script using the Nitech database is also prepared. Japanese voices trained by the demo script can be used on [[GalateaTalk:http://hil.t.u-tokyo.ac.jp/~galatea/]], which is a speech synthesis module of an open-source toolkit for anthropomorphic spoken dialogue agents developed in [[Galatea project:http://hil.t.u-tokyo.ac.jp/~galatea/]]. An HTS voice for Galatea trained by the demo script is also released. * News! [#ve28e7f9] - ''March 12, 2020'' - ''March 12, 2021'' > The code to train DNN-HSMM for text-to-speech synthesis was released.~ DNN-HSMM maps phoneme(state)-level linguistic features into hidden-semi Markov model parameters.~ - The code: -- Supports model training based on a maximum likelihood criterion. -- Supports maximum likelihood parameter generation (MLPG). - ''December 25, 2017'' > HTS version 2.3.2 was released.~ Its new features are - Demo scripts: -- Add trajectory training considering global variance based on DNN (deep neural network). -- Add speaker adaptive training for DNN. (It trains the connection weights of the whole DNN for each speaker.) - ''December 25, 2016'' > HTS version 2.3.1 was released.~ Its new features are - Demo scripts: -- Add frame-by-frame modeling option using DNN (deep neural network) based on HMM state alignment. - ''December 25, 2015'' > HTS version 2.3 was released.~ Its new features are - HERest: -- Add VBLR adaptation. - HMGenS: -- Add DAEM-based parameter generation. -- Support DP search to determine state duration when the model alignments are given. - HInit, HRest, HRest: -- Support parallel mode. - HHEd: -- Speed up context-clustering by calculating differences between answers to current and previous questions. -- Add untying weights function in HHEd. - Demo scripts: -- Add modulation spectrum-based postfilter. -- Support text files instead of utt files for general English database. -- Turn off spectrum normalization in STRAIGHT. -- Add LSP postfilter. -- Support mel-cepstrum based aperiodic measure generated by STRAIGHT. -- Support new HTS voice format for hts engine API. -- Integrate normal demo and STRAIGHT demo. - ''December 25, 2014'' > HTS version 2.3 beta was released to the hts-users ML members. - ''May 1, 2013'' > A tutorial about HMM-based speech synthesis was published on Proceedings of the IEEE: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6495700 - ''December 25, 2012'' > HTS version 2.3 alpha was released to the hts-users ML members. //- ''July 7, 2011'' //> HTS version 2.2 was released.~ //Its new features are //- HERest: //-- Support DAEM algorithm in parameter estimation step. //- HHEd: //-- Support KLD-based state-mapping and cross-lingual speaker adaptation. //-- Context-clustering can be started in the middle of the tree building. //- HMgeTool: //-- Add ECD-based MGE traning command, HMgeTool. //- HSMMAlign: //-- Add stand-alone HSMM based forced-alignment command, HSMMAlign. //- Demo scripts: //-- Change sampling frequency from 16kHz to 48kHz. //-- Support bark critical-band based aperiodic measure. //-- Change speaker and singer of Brazilian Portuguese and Japanese song demo, respectively. //- Slides: //-- Release slides as a tutorial of HMM-based speech synthesis. //- ''March 3, 2011'' //> HTS version 2.2 beta was released to the hts-users ML members. //- ''December 25, 2010'' //> HTS version 2.2 alpha was released to the hts-users ML members. //- ''May 14, 2010'' //> HTS version 2.1.1 was was released.~ //Its new features are //- Based on HTK-3.4.1 //- Many bug fixes //- HFst: //-- WFST converter for forced-alignment of HSMM //- HMGenS: //-- Initial GV weight for parameter generation //-- Model-level alignments given from label of singing voice to determine note-level durations //- HHEd: //-- Memory reduction options for context-clustering //- Demo scripts: //-- Context-dependent GV without silent and pause phoneme //-- Demo using the Nitech Japanese database for singing voice synthesis //- ''December 25, 2009'' //> HTS version 2.1.1 beta was released to the hts-users ML members. //- ''August 27, 2009'' //> The first HTS meeting in [[Interspeech 2009:http://www.interspeech2009.org/conference/]]. //- ''May 22, 2009'' //> HTS-Demo for Brazilian Portuguese is released. // - ''March 16, 2009'' // > Prof. Keiichi Tokuda & Dr. Heiga Zen have a [[tutorial about HMM-based speech synthesis>Tutorial]] at [[Interspeech 2009:http://www.interspeech2009.org/conference/]]. //- ''July 31, 2008'' //> The API of runtime synthesis engine, hts_engine API, was splitted from HTS itself and moved to [[SourceForge:http://hts-engine.sourceforge.net/]].~ // hts_engine API version 1.01 and Flite+hts_engine version 0.90 were released. //- ''July 14, 2008'' //> [[Keiichiro Oura:http://www.sp.nitech.ac.jp/~uratec/]] took over the //maintainer of HTS from [[Heiga Zen:http://www.sp.nitech.ac.jp/~zen/]]. //- ''June 27, 2008'' //> HTS version 2.1 and hts_engine API version 1.0 were released.~ //Their new features are //- HTS-2.1 //-- Many bug fixes //-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]] //-- Simple documentation //-- 64-bit compile support //-- MAXSTRLEN (max length of strings), SMAX (max # of streams), and PAT_LEN //(max length of patterns) can be set through configure script like // ./configure MAXSTRLEN=1024 SMAX=20 //-- HFB: //--- HSMM training and adaptation //-- HAdapt: //--- SMAPLR/CSMAPLR adaptation //-- HGen: //--- Speech parameter generation algorithm considering GV //--- Random generation of state transitions, state durations, and mixture //components (by configuration variable RNDFLAGS) //-- HMGenS: //--- Speech parameter generation from HSMMs //-- HHEd: //--- Add DM command to delete existing macros //--- Add IT command to impose pre-built trees in clustering //--- Add JM command to merge difference models on state or stream levels //--- MU command supports '*2' style mixing up //--- MU command supports mixture-level occupancy threshold in mixing up (by //configuration variable MINMIXOCC) //- hts_engine API-1.0: //-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]] //-- Support LSP-type parameters including LSP, mel-LSP, and MGC-LSP //-- Speech parameter generation algorithm considering GV //- ''June 13, 2008'' //> HTS version 2.1RC2 and hts_engine API version 0.99 were released to the hts-//users ML members.~ //See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00336.html]] for //details. // - ''May 27, 2008'' // > HTS voice building tools for the MARY platform was released with [[DFKI MARY 3.6.0:http://mary.dfki.de/Download/mary-3-6-0-released]]. // // - ''March 24, 2008'' // > HTS version 2.1RC1 and hts_engine API version 0.96 were released to the hts-users ML members. See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00175.html]] for details. // - ''January 15, 2008'' // > HTS version 2.1beta and hts_engine API version 0.95 were released to the hts-users ML members. // - ''December 7, 2007'' // > hts_engine was ported to Java and included in [[DFKI MARY 3.5:http://mary.dfki.de/Download/mary-3-5-0-released]]. // - ''November 1, 2007'' // > HTS version 2.1alpha was released to the hts-users ML members. // - ''October 1, 2007'' // > HTS version 2.0.1 and hts_engine_API version 0.9 were released.~ // The new features are // - Many bug fixes. // - Band structure for linear transforms. // - Stream-dependent variance flooring scales. // - State duration model mmf structure is changed. In the previous versions we // used a multi-variate Gaussian PDF to represent state duration PDFs of an HMM. // However, from this version we use multi-stream structure. This is very important for the future HSMM support. // - Demo scripts support LSP-type parameters for spectral representation in addition to cepstral ones. // - API-style implementation of hts_engine. Old stand-alone hts_engine will be thrown away. // - ''September 20, 2007'' // > HTS version 2.0.1RC1 was released to the hts-users ML members. // - ''September 18, 2007'' // > HTS version 2.0.1RC1 was released to the internal working group members.