Application
engineer

Staying humble, building everything

知らないから、すべてを創れる

Scroll to explore

Leland

Building applications to empower the world. Based in Tokyo.

More About Me

Current StatusNot Open to Work, Connection Welcome 🤗

Latest Publications

5 / 5 Items

Controllable Text-To-Speech with FastSpeech2

1/31/2026

#ai#nlp#speech-synthesis

A summary of research on enhancing control over synthesized voice timbre, tone, and emotion while maintaining naturalness.

This is a final conclusion of my B.E graduation work, code is open-source at: https://github.com/aucki6144/ctts

Introduction

Speech synthesis, the task of converting text into natural-sounding speech, is a central topic in AI, NLP, and speech processing. While recent advancements in deep learning have significantly improved the naturalness and robustness of speech synthesis, there are still challenges in controlling the nuances of synthesized speech, such as tone, pitch, and emotion. This b...

Read Article

Reading Notes: Deduplicating Training Data Makes Language Models Better

10/22/2024

#paper#machine-learning#nlp

An introduction to Resilient Distributed Datasets (RDDs) in PySpark, covering lineage, transformations, and actions.

Reading notes - Deduplicating Training Data Makes Language Models Better

1 Introduction & Motivation

A key factor behind the recent progress in Natural language processing (NLP) and large language models (LLMs) is that the scale of both model parameter and dataset is growing rapidly. This moves us into all web-based crawled dataset, leading to an unpromised data quality. It's too expensive to performance manual review. It's impossible for us to regulate and design the datasets to guarant...

Read Article

Algorithm Design for Big Data

3/15/2024

#algorithms#distributed-computing#spark

Parallel algorithm design patterns including Prefix Sums and Sample Sort using Spark's mapPartitions.

Spark: Alogrithm Design for Big Data

Embarrassingly parallel problems

......

Read Article

Spark: Job Scheduling and Locality

3/10/2024

#spark#internals#distributed-systems

How Spark schedules jobs, stages, and tasks based on data locality and memory management.

Spark: Job Scheduling

Operations on RDDs

......

Read Article

Spark: Data Partitioning Strategies

3/1/2024

#spark#optimization#bigdata

Understanding Hash vs Range partitioning to optimize parallelism and balance workloads in Spark RDDs.

Spark: Partitions

RDDs are stored in partitions. Programmer specifies number of partitions for an RDD (Default value used if unspecified). More partitions means more parallelism but also more overhead.

RDDs are stored in partitions. When performing computations on RDDs, these partitions can be operated on in parallel.
You get better parallelism when the partitions are balanced.
When RDDs are first created, the partitions are balanced.
However, partitions may get out of balance after...

Read Article

Let's Connect

Exploring the intersection of design, code, and narrative. Always open for interesting collaborations.

Github LinkedIn

Application engineer

Leland

Latest Publications

Controllable Text-To-Speech with FastSpeech2

Introduction

Reading Notes: Deduplicating Training Data Makes Language Models Better

Reading notes - Deduplicating Training Data Makes Language Models Better

1 Introduction & Motivation

Algorithm Design for Big Data

Spark: Alogrithm Design for Big Data

Embarrassingly parallel problems

Spark: Job Scheduling and Locality

Spark: Job Scheduling

Operations on RDDs

Spark: Data Partitioning Strategies

Spark: Partitions

Let's Connect

Application
engineer