[Go] MongoDB 쿼리 최적화를 통한 서버 성능 개선

현재 운용중인 Go 서버에서는 `gRPC` 통신과 `MongoDB` 데이터베이스를 사용한다.
그렇기에 DB에서 데이터를 가져온 뒤 Protobuf 메시지로 매핑하여 해당 데이터를 Response에 담아 반환한다.

최근 MAU가 급격하게 상승하고, 특정 프로필에서 가져오려던 데이터가 기하급수적으로 많아져서 성능 저하가 발생했기에 리펙토링을 진행하였다.

가장 시간이 많이 걸리던 부분은 유저가 타 유저에게 보내거나 받은 데이터를 조회하려고 할 때
해당 카드의 Sender와 Receiver 프로필을 조회하게 되는데 여기에서 몇 가지 문제점을 발견했다.

1. users 컬렉션 내부 데이터들의 Referencing 부재

- 특정 Profiles 내 Object 값만 조회하기 때문에 쿼리가 동작할 때 모든 데이터를 볼 필요가 없음에도 불필요한 비용 소모

2. for-range 개별 조회시 다수의 DB 호출로 인한 성능 저하 초래

- 쿼리로 한번에 데이터를 불러온다면 성능 향상 가능

이 문제를 해결하기 위해 먼저 users 컬렉션과 별개로 users-log 컬렉션을 구성하였다.
users 컬렉션에서 사용자는 반드시 1개의 기본 프로필을 갖게 되고, 2개의 추가 프로필을 가질 수 있는데
해당 Object에 해당하는 _id 값을 인덱싱하는 용도로 MongoDB Reference 개념을 활용해 users-log 컬렉션 내부에
각각의 문서마다 최소한의 값만 보관할 수 있도록 DB를 구성하였다.

type UserLog struct {
    Id        string    `bson:"_id" json:"id"`
    Ids       []string  `bson:"ids" json:"ids"`
    DeletedAt time.Time `json:"deleted_at,omitempty" bson:"deleted_at,omitempty"`
}

또한 유저가 탈퇴시에도 논리 삭제(Soft Delete)만 진행하기에 컬렉션 간에 값을 계속 일치시킬 수 있었다.

일단 기존 방식을 살펴보면

func createProfileMap(profileIds []string) (map[string]*pb.Profile, error) {
	profileMap := make(map[string]*pb.Profile)

	for _, profileId := range profileIds {
		profile, err := getProfileInfo(profileId)
		if err != nil {
			return nil, err
		}
		profileMap[profileId] = profile
	}
	return profileMap, nil
}

카드의 profileIds를 전부 모은 뒤 for range로 Id값을 가져온 뒤 하나씩 쿼리를 돌려서 검색하는 방식이었다.
매우 비효율적인 방식이고 평면적으로만 문제를 해결하려고 했던 주니어의 악수로 보인다. (유머)

이후에는 코드를 대거 수정하였다.

func createProfileMap(profileIds []string) (map[string]*pb.Profile, error) {
	profileMap := make(map[string]*pb.Profile)

	var validProfileIds []string
    	// 1. profileIds가 존재하는지 체크, 없으면 빈 객체 구성
	for _, profileId := range profileIds {
		if profileId == "" {
			profileMap[profileId] = &pb.Profile{
				Id:     "",
				Type:   pb.EProfileType_EPT_UNSPECIFIED,
				Name:   "",
				Gender: pb.EGender_EG_UNSPECIFIED,
			}
		} else {
			validProfileIds = append(validProfileIds, profileId)
		}
	}

	// 2. validProfileIds를 usersLog 컬렉션에 쿼리로 관련된 _id 가져오기
	logIds, err := fetchUserLogIds(validProfileIds)
	if err != nil {
		return nil, err
	}

	// 3. 가져온 _id로 유저 프로필 Fetch
	profiles, err := fetchProfiles(logIds)
	if err != nil {
		return nil, err
	}

	// 4. 프로필과 profileIds 매칭 작업 후 반환
	for _, profile := range profiles {
		profileMap[profile.Id] = profile
	}
	return profileMap, nil
}

여기서 신경 쓴 부분은 `fetchUserLogIds`함수와 `fetchProfiles`함수인데, 먼저 fetchUserLogIds 함수부터 살펴보겠다.

func fetchUserLogIds(profileIds []string) ([]string, error) {
	innerContext, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()

	filter := bson.M{
		"ids": bson.M{"$in": profileIds},
	}
	findOptions := options.Find().SetProjection(bson.M{"_id": 1})

	cursor, err := mongodb.UserLogColl.Find(innerContext, filter, findOptions)
	if err != nil {
		return nil, err
	}
	defer cursor.Close(innerContext)

	var userLogs []userdoc.UserLog
	if err = cursor.All(innerContext, &userLogs); err != nil {
		return nil, err
	}

	var userLogIds []string
	for _, userLog := range userLogs {
		userLogIds = append(userLogIds, userLog.Id)
	}

	return userLogIds, nil
}

코드는 위와 같이 구성하였는데, 우리가 가져와야 될 값은 _id 필드의 값이니 SetProjection 메서드를 사용하여 해당 필드만 반환하도록 설정했다. 이후 일치하는 문서들의 커서를 반환하였고, 작업을 완료한 뒤 리소스 해제하는 것 역시 잊지 않고 추가하였다.

모든 결과를 userLogs 배열에 저장하고, 해당 배열을 for range로 순회하며 _id값을 userLogIds 배열에 추가하였다.

여기서는 반복문을 사용하여 개별적으로 조회하는 대신 $in 연산자를 사용하여 한 번의 쿼리로 필요한 모든 데이터를 조회하였기에 DB 호출 횟수가 줄어들어 성능을 향상시켰다.
또한 SetProjection 메서드를 사용하여 필요한 필드만 조회하여 네트워크 부하를 줄이고, 쿼리를 개선했다.

fetchProfiles 함수의 경우 특이점은 없으나

var profiles []*pb.Profile
	for _, user := range users {
		defaultProfile, _ := user.DefaultProfile.ToPB(pb.EGender(pb.EGender_value[user.Gender]))
		profiles = append(profiles, defaultProfile)

		if user.AnonymousProfile != nil {
			anonymousProfile, _ := user.AnonymousProfile.ToPB(pb.EGender(pb.EGender_value[user.Gender]))
			profiles = append(profiles, anonymousProfile)
		}
		if user.CrushProfile != nil {
			crushProfile, _ := user.CrushProfile.ToPB(pb.EGender(pb.EGender_value[user.Gender]))
			profiles = append(profiles, crushProfile)
		}
	}

기존에는 users 컬렉션에서 성별 필드값을 가져오고, 프로필도 따로 가져오다보니 2번의 작업이 불필요하게 수행되었다. 그러나 현재는 users 컬렉션에서 해당 Document를 profileType에 맞게 한번에 가져온 뒤 protobuf 형식으로 변환한 뒤 반환하여 효율적으로 데이터를 조회할 수 있었다.
쓰면 쓸수록 재평가 받는 MongoDB 쿼리 ~~그동안 잘못 썼다는 게 정설~~

위 방식으로 리펙토링을 진행하였고, 속도가 얼마나 빨라졌냐면..
평균적으로 4000ms가 나오던 조회 속도가 80ms까지 단축되었다.

그리 오래 걸린 작업은 아니지만 이번 경험을 통해 같은 쿼리라도 어떤 방식으로 사용하는지에 따라 성능을 훨씬 향상시킬 수 있다는 것을 느꼈고, 앞으로는 당면한 문제를 더욱 다양한 시각으로 바라보고 접근하여 유저들에게 더 나은 서비스 경험을 제공하고 싶다.

1. users 컬렉션 내부 데이터들의 Referencing 부재

2. for-range 개별 조회시 다수의 DB 호출로 인한 성능 저하 초래

티스토리툴바