In distributed TensorFlow, identifying the nodes without domain name collision is a big challenge. CoreDNS supports DNS Name Server Identifier (NSID) which allow a DNS server to identify itself. So we can deploy CoreDNS for every node in the distributed TensorFlow cluster to solve this problem. There are two ways to achieve this goal. One way is to set up a distributed Key-Value store like zookeeper or etcd, and another way is to assign each node with an order based on the timestamp. My GSoc work aims to implement one of the approaches above.

Student

Jiacheng Xu

Mentors

  • Miek Gieben
  • Yong Tang
  • John Belamaric
close

2018