Hacker Newsnew | past | comments | ask | show | jobs | submit | rhdunn's commentslogin

VLC does not show up when searching for "music player" on nixpkgs [1] which is what was used. Searching for "media player" does [2].

[1] https://search.nixos.org/packages?channel=unstable&query=mus...

[2] https://search.nixos.org/packages?channel=unstable&query=med...


This still comes across as ignorance of the area. How does one miss VLC?

One of the annoying things about having a living standard is that it is difficult to implement a conforming version as additional updates means that you are no longer conforming.

Versioned standards allow you to know that you are compliant to that version of the specification, and track the changes between versions -- i.e. what additional functionality do I need to implement.

With "living standards" you need to track the date/commit you last checked and do a manual diff to work out what has changed.


AI in this sense means using Machine Learning (ML)/Neural Networks (NN) to convert the text (or phonemes) to audio.

There are effectively two approaches to voice synthesis: time-domain and pitch-domain.

In time-domain synthesis you care concatenating short waveforms together. These are variations of Overlap and Add: OLA [1], PSOLA [2], MBROLA [3], etc.

In pitch-domain synthesis, the analysis and synthesis happens in the pitch domain through the Fast Fourier Transform (visualized as a spectrogram [4]), often adjusted to the Mel scale [5] to better highlight the pitches and overtones. The TTS synthesizer is then generating these pitches and converting them back to the time domain.

The basic idea is to extract the formants (pitch bands for the fundamental frequency and overtones) and have models for these. Some techniques include:

1. Klatt formant synthesis [6]

2. Linear Predictive Coding (LPC) [7]

3. Hidden Markov Model (HMM) [8]

4. WaveGrad NN/ML [9]

[1] https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method

[2] https://en.wikipedia.org/wiki/PSOLA -- Pitch-synchronous Overlap and Add

[3] https://en.wikipedia.org/wiki/MBROLA -- Multi-Band Resynthesis Overlap and Add

[4] https://en.wikipedia.org/wiki/Spectrogram

[5] https://en.wikipedia.org/wiki/Mel_scale

[6] https://en.wikipedia.org/wiki/Dennis_H._Klatt

[7] https://en.wikipedia.org/wiki/Linear_predictive_coding

[8] https://www.cs.cmu.edu/~awb/papers/ssw6/ssw6_294.pdf

[9] https://arxiv.org/abs/2009.00713 -- WaveGrad: Estimating Gradients for Waveform Generation


Kotlin does have interop with Java, but is limited by either the features not existing in Java (non-nullable types) or behave differently in Java (records, etc.).

You have to explicitly annotate that a Kotlin data class is a Java record due to the limitations Java has on records compared to data classes [1]. This is similar to adding nullable/not-null annotations in Java that are mapped to Kotlin's nullable/non-nullable types.

Where there is a clean 1-1 mapping and you are targeting the appropriate version of Java, the Kotlin compiler will emit the appropriate Java bytecode.

[1] https://kotlinlang.org/docs/jvm-records.html#declare-records...


I view the relationship between Kotlin and Java like that between C++ and C.

The two-way interop is one of Kotlin's advantages as it makes porting code from Java to Kotlin easier, or using existing Java libraries. For example, you don't have/need something like Scala's `asJava` and `asScala` mappers as the language/standard library does that mapping for you.

The interop isn't always perfect or clean due to the differences in the languages. But that's similar to writing virtual function tables in C -- you can do it, and have interop between C and C++ (such as with COM) but you often end up exposing internal details.


It's not just screen reader users. I use TTS to listen to text content and the AI TTS voices I've tried have the issues with skipping words or generating garbled output in sections.

I don't know if this is a data/transcription issue, an issue with noisy audio, or what.


And at the next election -- if the current polling stays consistent -- they are likely to get 15-85 seats, which is not enough for them to gain power. Even then they are unlikely to form a coalition as Labour are not doing well in the polls currently, and the gain in support for the greens is largely coming at Labour's expense.

[1] https://www.electoralcalculus.co.uk/homepage.html

[2] https://en.wikipedia.org/wiki/Opinion_polling_for_the_next_U...


Works in ladybird as well.

Maps were added in XPath 3.1 -- https://www.w3.org/TR/xpath-31/#id-maps.

There's currently work on XPath 4.0 -- https://qt4cg.org/specifications/xquery-40/xpath-40.html.


Using neural nets (machine learning) to train TTS voices has been around a long time.

[1] (2016 https://arxiv.org/abs/1609.03499) WaveNet: A Generative Model for Raw Audio

[2] (2017 https://arxiv.org/abs/1711.10433) Parallel WaveNet: Fast High-Fidelity Speech Synthesis

[3] (2021 https://arxiv.org/abs/2106.07889) UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

[4] (2022 https://arxiv.org/abs/2203.14941) Neural Vocoder is All You Need for Speech Super-resolution


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: