This thesis tackles the high computational and memory demands of Transformers, proposing “Green AI” methods to reduce costs without sacrificing performance. Key techniques include network quantization, which lowers computational precision to make AI more accessible across platforms. The two main approaches—quantization-aware training (QAT) and post-training quantization (PTQ)—offer distinct advantages: QAT integrates quantization during training for improved performance, while PTQ is a cost-effective solution suited for larger models. Novel methods are introduced for both QAT and PTQ to minimize performance loss after quantization, supporting efficient and sustainable AI applications.