Ping An Group
27 Mar 2020

Ping An Top-Ranked in Stanford Question Answering Dataset 2.0, Exceeding Average Human Performance

(Hong Kong, Shanghai, 27 March 2020) Ping An Insurance (Group) Company of China, Ltd. (hereafter “Ping An” or the “Group”, HKEx:2318; SSE:601318) announced that Ping An Technology (Shenzhen) Co., Ltd. (Ping An Technology) is top-ranked in the Stanford Question Answering Dataset 2.0 (SQuAD 2.0) of Stanford University, an internationally recognized test of machine reading comprehension. Ping An’s performance beat average human performance. This is the third time that Ping An Technology has topped this competition.


SQuAD is widely recognized across the artificial intelligence industry. The SQuAD 1.1 test included more than 100,000 Q&As based on more than 500 Wikipedia articles. SQuAD 2.0 is even more challenging, with additional 50,000 new human-written questions. The new questions look similar to the original questions but are not answerable from the selected texts. The machine reading comprehension models submitted by the participating teams need to determine whether the questions can be answered by reading the articles. If the questions cannot be answered, the model must abstain from answering.

In this competition, the ensemble model of ALBERT + DAAF + Verifier submitted by Ping An Technology achieved an Exact Match (EM) score of 90.386 for answers that were an exact match to the standard answers, and an F1 score of 92.777 for partially correct answers. Both results have placed Ping An first overall among global competitors. The DAAF (Data Augmentation and Auxiliary Feature) is a learning framework developed by Ping An and played a key role in the test. The framework contains forward and backward algorithms. The forward algorithm can absorb the data for enhancement from external data, and the backward algorithm can filter out the data that has a negative impact on enhancement.

Both Ping An scores beat average human performance, according to SQuAD 2.0. Ping An’s EM score of 90.386 was 3.56 percentage points higher. The F1 score of 92.777 was 3.33 percentage points higher.

In the previous SQuAD tests, Microsoft, Google, Alibaba and other companies’ teams took turns in ranking first. As of 27 March 2020, Ping An ranks first, Shanghai Jiao Tong University ranks second and Google ranks fourth in SQuAD 2.0.


This website uses cookies to help us provide you with better experience and allow us to improve our service. By continuing to browse the site, you understand and agree to our Privacy Policy and Terms of Use .

This website is not supported by IE. Get the latest version of Firefox or Chrome for better browsing experience.